Understanding leaf values

Lopet · September 17, 2020, 8:51pm

Hi,

I’m a university student, and I started to work with xgboost a month ago. I made a Classification model, which can predict a log file’s error type (from 7 different types). The model has an accuracy of 76%. I would like to understand the leaf values. So when I print the trees:

 booster[2]:
     0:[only<3.11304689e-07] yes=1,no=2,missing=1,gain=0.586026371,cover=3.71875
     	1:[mgr<3.13217683e-08] yes=3,no=4,missing=3,gain=0.234146357,cover=3.0625
     		3:[writeback<8.19934996e-07] yes=5,no=6,missing=5,gain=0.269478679,cover=2.84375
     			5:leaf=0.344827592,cover=2.625
     			6:leaf=-0.051282052,cover=0.21875
     		4:leaf=-0.051282052,cover=0.21875
     	2:leaf=-0.113207549,cover=0.65625

and so on… How can I understand this, or convert these values to labels? I mean if I want to plot this with plot_tree(model), I would like to see the letters values as labels, not values. For example if the leaf’s value is between -0.2 and 0 than this is a ‘A’ error type, if between 0.2 and 0.4 than ‘C’. I’m sorry if I asked something that someone else has already done, I did not find nothing.

Best wishes,
Peter

hcho3 · September 17, 2020, 10:31pm

XGBoost uses one-vs-rest method to perform classification with multiple classes. Given M classes and N boosting rounds, XGBoost fits M * N trees. At prediction times, the leaf values from M * N trees are combined into M partial sums, and then the class with the highest sum is chosen as the predicted label.

Lopet · September 19, 2020, 10:03pm

Thank you very much!