Output Margin and Leaf Probabilities

kgoyal40 · April 16, 2019, 11:03am

Hi,

I am doing some work which requires me to calculate the final prediction(classification) by using each residual tree individually. Right now I am using the predict function (using output_margin = True) to get the un-transformed margin for each residual tree, I am then summing these margin values to get a final value which is then passed in the logit function. This seems to be incorrect as my accuracy value is very low compared to the baseline model (which is the original XGBoost prediction method.), even though the logloss value is better in my model. So, my question is, is my approach correct or do I need to use the leaf probabilities using the predict_leaf option? Please let me know

Thanks

hcho3 · April 16, 2019, 9:25pm

@kgoyal40 You should add 0.5 to the sum of margin values. This value is controlled by the parameter base_score.

kgoyal40 · April 17, 2019, 11:09am

Hi,

Let me explain a bit more. My confusion is with the idea of the prediction we get with output_margin = True. As per the theory the prediction after the last residual tree should be the final value softmax is applied to, right? or do i need to sum up the predictions after each tree and then apply the softmax? I forgot to add that I am boosting a residual tree from the previous margin.

hcho3 · April 17, 2019, 1:04pm

Did you add 0.5 to the sum of untransformed margins from the trees? Add 0.5 to the sum and then pass it to the sigmoid function.