Probabilities returned by multi:softprob

drkarim · July 22, 2020, 12:14pm

Can someone help me understand how the multi-class probabilities are calculated?

Is it calculated during bagging by simply counting how many ensembles predicted the label, divided by total ensembles?

hcho3 · July 23, 2020, 6:12am

No, we use one-vs-rest method to classify multi-class data. So if you run K boosting rounds, you will obtain K * C trees, where C is the number of classes. At prediction time, we group K * C trees into C groups and compute partial sums for each group, obtaining C scores. Finally, we take the softmax to convert the C scores into probabilities.

drkarim · August 8, 2020, 6:37pm

Thank you for your reply!

sebastian · April 20, 2021, 6:58pm

Would setting the ‘objective’ in params override the default one-vs-rest approach for multi-class data? Should be set it to binary:logistic, multi:softmax, or not specify it at all? Finally, does the answer to the previous question change if you want to create per-class precision recall curves after training to evaluate performance? Thank you.