How does XGBoost know the SHAP base values for each tree?

How does XGBoost know the base value for each tree when computing shap values?

shap_values = bst.predict(x_i, pred_contribs=True)

There is a really nice explanation here which explains what SHAP values are, why they are useful and how SHAP values are calculated, for a given prediction. It’s a nice read.

What isn’t clear to me though, is how a pre-trained XGBoost can know the base value when computing SHAP for a new, individual case. The article states, that the base value should equal then “mean prediction for the training set”, although I have since learnt in the XGBoost case, this is actually based on the sums of Hessians for a given tree.

But still, where is this value stored? Can anybody clarify exactly how the base value is computed/where it comes from? Many thanks

The sum of Hessians is used as a proxy for the number of data points that flow through each particular tree node. The value comes from gradient boosting. See https://dl.acm.org/doi/10.1145/2939672.2939785 for more details.

Thanks! My question though is more on the implementation side; where is the sum of Hessians actually stored in a given XGBoost model object? This is what I do not understand, but the value must come from somewhere during computation. Thanks

See the following code snippet. The sum_hess field contains the sum of Hessians.

1 Like