Very first tree in XGBRegressor not centered

Just for learning purposes, I decided to turn off regularization and compare XGBRegressor with GradientBoostingRegressor from sklearn to see what else is different.

This is when I discovered that XGBRegressor doesn’t seem to use the sample mean as its very first prediction, as is typically done for the traditional GBM. A simple reproducible example is provided below.

You can see that for the same parameters, XGB and GBM produce predictions that are 100% correlated for the very first tree. However, GBM predictions are centered around the sample mean of y, which is expected, whereas XGB predictions have a constant offset from GBM predictions.

As I increase the number of trees for XGBoost, I can see the predictions start to slowly ‘migrate’ towards the correct scale of the y variable.

Can anyone help to explain this unexpected behavior or point me to the paper/documentation that describes this? Or perhaps I am missing some crucial parameter setting?

Thanks!

##############################################

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
from IPython.display import display
import pandas as pd
import xgboost

X, y = load_boston(return_X_y=True)

params = {‘n_estimators’: 1, ‘learning_rate’: 0.01, ‘max_depth’: 6} # Use only one tree

gbm = GradientBoostingRegressor(**params).fit(X, y)
xgb = xgboost.XGBRegressor(reg_lambda=0, **params).fit(X, y) # Turn off regularization

yhat = [y, gbm.predict(X), xgb.predict(X)]
yhat = pd.concat((pd.Series(y) for y in yhat), axis=1, keys=[‘y’, ‘gbm’, ‘xgb’])

display(yhat.describe())
display(yhat.corr())
display((yhat[‘gbm’] - yhat[‘xgb’]).describe().rename(‘GBM - XGB’).to_frame())

##############################################

y gbm xgb
count 506.000000 506.000000 506.000000
mean 22.532806 22.532806 0.720330
std 9.197104 0.089404 0.089404
min 5.000000 22.382978 0.570500
25% 17.025000 22.473454 0.660976
50% 21.200000 22.515138 0.702660
75% 25.000000 22.576478 0.764000
max 50.000000 22.807478 0.995000
y gbm xgb
y 1.000000 0.972089 0.972089
gbm 0.972089 1.000000 1.000000
xgb 0.972089 1.000000 1.000000
GBM - XGB
count 5.060000e+02
mean 2.181248e+01
std 1.991912e-08
min 2.181248e+01
25% 2.181248e+01
50% 2.181248e+01
75% 2.181248e+01
max 2.181248e+01

XGBoost does not start boosting from the target mean, whereas sklearn does. So the behavior is expected.

1 Like