How to make predictions using the model text dump to match XGBoost predict() behaviour?

I’m trying to build a lightweight inference engine based on the text version of the model (exported using dump_model()), that features the different trees and their leaf values.

# Structure of the model exported suing dump_model()
booster[0]:
0:[f53<0.5] yes=1,no=2,missing=1
    1:[f11<7.5] yes=3,no=4,missing=3
        3:[f8<1.62607923e+09] yes=7,no=8,missing=7
            7:[f62<0.985777259] yes=15,no=16,missing=15
                 ...
booster[1]:
   ...
booster[2]:
   ...
    .
    .
booster[99]:
   ...

I’m computing the output of each tree, and adding the corresponding leaf values (before passing the result through a logistic transformation to match the way a ‘binary:logistic’ model outputs the probabilities).
But for now, I can’t get the outputs to match with XGBoost ones.

So: How does the predict function of XGBoost work? Is there any computing step I’m missing?

Huge thanks!

You should add 0.5 to the sum of leaf outputs and then feed it through the sigmoid function.

Thanks for the quick answer. I already tried that (initializing the sum of leaf outputs at 0.5) and the outputs still don’t match. So you’re confirming XGBoost’s predict function only sums 0.5 + the leaf values to get the raw output?

Can you put up your model here so that I can look at it?

I was using a C++ script based on the XGBoost-FastForest github library, but I couldn’t get it to match XGBoost predictions.

I then tried a more simple Python solution based on this old post of yours, and I got it to work (thanks for that)!

However, in order to match XGBoost predict() values, I do not add 0.5 to the sum of leaf values:

def predict(model_dump, sample):
    prediction = 0
    for i in range(len(model_dump)):
        prediction += get_leaf_value(model_dump[i], sample)
    return prediction

The model I’m using is the following:

model = XGBClassifier(learning_rate=0.04, nb_estimators=100, verbose_eval=True, random_state=1, n_jobs=8,
                      max_depth=12, objective='binary:logitraw', verbosity=1, min_child_weight=1, gamma=0, subsample=0.8)