XGBoost trees understanding

Hi All,
I have I believe a simple question which I’m not sure of the answer.

While running XGBoost with say 100 trees, if we want to manually reconstruct the process for one observation, how should we use the trees to find the final leaf for this specific observation?
Is it the last tree (100) or a complex gradient calculation of all trees?

Many thanks

No, you should obtain the final leaf for every trees in the ensemble. The prediction for a particular observation is obtained by summing the output from all the trees.

Many thanks hcho3!

Unfortunately, that’s what I was concerned about.

I’m trying to classify each observation in a specific and unique leaf or group. Is there a way I can achieve this with gradient boosting?

You should call predict() function with parameter pred_leaf=True. That will give you the set of leaf IDs that each observation is associated with.

Excellent! That’s great… Many thanks,

Hi hcho3,
I cannot see parameter pred_leaf=True in XGBoost.predict(???) ?

I’m using python package XGBoost.XGBRegressor !!!

A bit confused. Any suggestion?
Many thanks

Try clf.get_booster().predict(pred_leaf=True). The get_booster() obtains the underlying Booster object inside the XGBRegressor model.

Hi hcho3,
clf.get_booster().predict(pred_leaf=True) did not work :frowning:

However, after many hours using XGBRegressor.apply(X) gave what I think is the leaf id…


Actually, I have the final leaves of all 20 trees’ rounds.

What does it means?

Do I need to sum up the average within all of final 20 leaves found for a specific observation
the 20th tree gives the final leaf and value?


You should sum the leaf values (outputs) associated with the 20 final leaves.