How to interpret the dump file of the XGBRegressor

chen · August 10, 2018, 9:17pm

I am trying to rewrite the trained XGBRegressor model to to c++. Although I can interpret each tree booster in the dump file, I am confused how to combine them together. For the regression problem, is the final predicted result the average of all the booster? Or should I get the weight of each tree from somewhere?

Best,

Chen

hcho3 · August 11, 2018, 11:49pm

For regression problem, the final prediction is given by

[final prediction] = sum([output of i-th tree], i=1..n) + [global bias]

where the global bias is given by the training parameter base_score (this is set to 0.5 by default).

chen · August 13, 2018, 2:27pm

Thank you so much for your reply! BTW, is there any function that can transfer the trained model to C++ if else statement? Additionally, how can I check the corresponding variables name of F0, F1…?

Best,

Chen

hcho3 · August 13, 2018, 3:22pm

Look at dmlc/treelite, where you can do

import treelite
model = treelite.Model.load('my_model.model', model_format='xgboost')
model.compile(dirpath='./my_model_c99', verbose=True)     # Generate C99 code

Treelite also provides for helper functions, such as prediction runtime in Python and Java.

As for variable (feature) name, I think XGBoost will check them, but if you are planning to implement your model in C++, you’ll have to check them manually. The idea is to maintain a mapping from feature name to integer index, e.g.

F0 -> 0
F1 -> 1
F2 -> 2
...

With this mapping, you can perform conversion from (feature name, feature value) pairs to a dense feature vector (with missing value as np.nan (Python) or Float.NaN (Java))