XGBoost learning-to-rank model to predictions core function?


#1

Hi,

I have trained xgboost model in spark with one tree model with “-booster gbtree --eval_metric ndcg --objective rank:pairwise”, the dumped model text is as shown below.

booster[0]:
0:[feature1<2323] yes=1,no=2,missing=2
1:[feature2<2.00000095367431640625] yes=3,no=4,missing=4
3:leaf=0.1649394333362579345703125
4:leaf=0.049700520932674407958984375
2:[feature2<2.00000095367431640625] yes=5,no=6,missing=6
5:leaf=0.0433560945093631744384765625
6:leaf=-0.09195549786090850830078125

The test data with only one record
feature1: 511
feature2: missing

It suppose to routed to leaf4: 0.049700520932674407958984375. However, the model predicting score gives 0.5497005. Is there an internal transfer function from the leaf score to the final predicting score inside xgboost? Can someone point a link? Thank you very much!!


R XGBoost predict result differs from result using xgb.model.dt.tree
#2

There is a global bias of 0.5 that gets added to every leaf output, so the “transfer function” would be f(x) = x + 0.5. You can remove this bias by setting base_score=0 when training.


Shap values not adding up to margin values
#3

oh, i c. Thank you very much! Can you point me a link in the codebase for add this bias? I spent hours trying to find it but couldn’t.


#4

base_score is a training parameter (see the parameter doc). So something like

param = {'max_depth': 2,
         'eta': 1,
         'objective': 'binary:logistic',
         'base_score':0}
num_round = 10
bst = xgb.train(param, dtrain, num_round, evallist)

Confusion about xgboost sklearn api plot_tree()
#5

Thanks for answer, but I spent hours trying to find it too. Is it possible to add it into documentation near rank.pairwise ?)


#6

@softitova The parameter base_score is already in the parameter doc. Screenshot:


#7

Thanks for @hcho3, I am doing translation from R Xgboost model into SQL and digging into dump file. This is helpful.


#8

Thank you so much, that’s really helpful.