Very first tree in XGBRegressor not centered

SilverBullet · November 10, 2020, 6:16pm

Just for learning purposes, I decided to turn off regularization and compare XGBRegressor with GradientBoostingRegressor from sklearn to see what else is different.

This is when I discovered that XGBRegressor doesn’t seem to use the sample mean as its very first prediction, as is typically done for the traditional GBM. A simple reproducible example is provided below.

You can see that for the same parameters, XGB and GBM produce predictions that are 100% correlated for the very first tree. However, GBM predictions are centered around the sample mean of y, which is expected, whereas XGB predictions have a constant offset from GBM predictions.

As I increase the number of trees for XGBoost, I can see the predictions start to slowly ‘migrate’ towards the correct scale of the y variable.

Can anyone help to explain this unexpected behavior or point me to the paper/documentation that describes this? Or perhaps I am missing some crucial parameter setting?

Thanks!

##############################################

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
from IPython.display import display
import pandas as pd
import xgboost

X, y = load_boston(return_X_y=True)

params = {‘n_estimators’: 1, ‘learning_rate’: 0.01, ‘max_depth’: 6} # Use only one tree

gbm = GradientBoostingRegressor(**params).fit(X, y)
xgb = xgboost.XGBRegressor(reg_lambda=0, **params).fit(X, y) # Turn off regularization

yhat = [y, gbm.predict(X), xgb.predict(X)]
yhat = pd.concat((pd.Series(y) for y in yhat), axis=1, keys=[‘y’, ‘gbm’, ‘xgb’])

display(yhat.describe())
display(yhat.corr())
display((yhat[‘gbm’] - yhat[‘xgb’]).describe().rename(‘GBM - XGB’).to_frame())

##############################################

	y	gbm	xgb
count	506.000000	506.000000	506.000000
mean	22.532806	22.532806	0.720330
std	9.197104	0.089404	0.089404
min	5.000000	22.382978	0.570500
25%	17.025000	22.473454	0.660976
50%	21.200000	22.515138	0.702660
75%	25.000000	22.576478	0.764000
max	50.000000	22.807478	0.995000

	y	gbm	xgb
y	1.000000	0.972089	0.972089
gbm	0.972089	1.000000	1.000000
xgb	0.972089	1.000000	1.000000

	GBM - XGB
count	5.060000e+02
mean	2.181248e+01
std	1.991912e-08
min	2.181248e+01
25%	2.181248e+01
50%	2.181248e+01
75%	2.181248e+01
max	2.181248e+01

hcho3 · November 10, 2020, 8:31pm

XGBoost does not start boosting from the target mean, whereas sklearn does. So the behavior is expected.