Just for learning purposes, I decided to turn off regularization and compare XGBRegressor with GradientBoostingRegressor from sklearn to see what else is different.
This is when I discovered that XGBRegressor doesn’t seem to use the sample mean as its very first prediction, as is typically done for the traditional GBM. A simple reproducible example is provided below.
You can see that for the same parameters, XGB and GBM produce predictions that are 100% correlated for the very first tree. However, GBM predictions are centered around the sample mean of y, which is expected, whereas XGB predictions have a constant offset from GBM predictions.
As I increase the number of trees for XGBoost, I can see the predictions start to slowly ‘migrate’ towards the correct scale of the y variable.
Can anyone help to explain this unexpected behavior or point me to the paper/documentation that describes this? Or perhaps I am missing some crucial parameter setting?
Thanks!
##############################################
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_boston
from IPython.display import display
import pandas as pd
import xgboost
X, y = load_boston(return_X_y=True)
params = {‘n_estimators’: 1, ‘learning_rate’: 0.01, ‘max_depth’: 6} # Use only one tree
gbm = GradientBoostingRegressor(**params).fit(X, y)
xgb = xgboost.XGBRegressor(reg_lambda=0, **params).fit(X, y) # Turn off regularization
yhat = [y, gbm.predict(X), xgb.predict(X)]
yhat = pd.concat((pd.Series(y) for y in yhat), axis=1, keys=[‘y’, ‘gbm’, ‘xgb’])
display(yhat.describe())
display(yhat.corr())
display((yhat[‘gbm’] - yhat[‘xgb’]).describe().rename(‘GBM - XGB’).to_frame())
##############################################
y | gbm | xgb | |
---|---|---|---|
count | 506.000000 | 506.000000 | 506.000000 |
mean | 22.532806 | 22.532806 | 0.720330 |
std | 9.197104 | 0.089404 | 0.089404 |
min | 5.000000 | 22.382978 | 0.570500 |
25% | 17.025000 | 22.473454 | 0.660976 |
50% | 21.200000 | 22.515138 | 0.702660 |
75% | 25.000000 | 22.576478 | 0.764000 |
max | 50.000000 | 22.807478 | 0.995000 |
y | gbm | xgb | |
---|---|---|---|
y | 1.000000 | 0.972089 | 0.972089 |
gbm | 0.972089 | 1.000000 | 1.000000 |
xgb | 0.972089 | 1.000000 | 1.000000 |
GBM - XGB | |
---|---|
count | 5.060000e+02 |
mean | 2.181248e+01 |
std | 1.991912e-08 |
min | 2.181248e+01 |
25% | 2.181248e+01 |
50% | 2.181248e+01 |
75% | 2.181248e+01 |
max | 2.181248e+01 |