XGBoost 1.7 fits much worse than 1.5 on noisy data - with reproducible experiment

I work with very noisy data and observed that my models would perform much worse when trained on version 1.7.4 than on version 1.5.2, with the approx method. I have a reproducible experiment that demonstrates the issue:

This is the data that is used across the two versions:

import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

x, y = make_classification(10000, 4, n_informative=2, n_redundant=0, weights=[0.1], n_clusters_per_class=2, flip_y=0.05, class_sep=0.01, hypercube=True, random_state=42)

x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7, stratify=y, random_state=216)

pd.DataFrame(x_train).to_csv("./xtrain", index=False)
pd.DataFrame(y_train).to_csv("./ytrain", index=False)
pd.DataFrame(x_test).to_csv("./xtest", index=False)
pd.DataFrame(y_test).to_csv("./ytest", index=False)

Now, in two different Python kernels with the two different XGBoost versions, run the same code:

import pandas as pd
import numpy as np
import xgboost

x_train = pd.read_csv("./xtrain")
y_train = pd.read_csv("./ytrain").values
x_test = pd.read_csv("./xtest")
y_test = pd.read_csv("./ytest").values

dtrain = xgboost.DMatrix(
    data=x_train,
    label=y_train,
)

deval = xgboost.DMatrix(
    data=x_test,
    label=y_test,
)

params = {
    'eta' : 0.03,
    'max_depth' : 3,
    'min_child_weight' : 0,
    'max_delta_step' : 0,
    'subsample' : 1,
    'base_score' : 0.5,
    'objective' : 'binary:logistic',
    'eval_metric' : 'logloss',
    'tree_method' : 'approx',
    'gamma' : 4,
}

model = xgboost.train(
    params=params,
    dtrain=dtrain,
    num_boost_round=200,
    evals=[(dtrain, 'dtrain'), (deval, 'eval')],
    early_stopping_rounds=10,
)

The fitting result for version 1.5.2 is:

[193]	dtrain-logloss:0.34347	eval-logloss:0.35179

and the fitting result for version 1.7.4 is:

[199]	dtrain-logloss:0.36625	eval-logloss:0.36653

We can see that the fitting performance is nearly 10% worse in the new version. In the real world example I was working with, it seems that 1.7.4 also tend to build shallower trees with worse performance:

v1.5.2:

[79]	dtrain-logloss:0.13350	eval-logloss:0.13540

v1.7.4:

[32]	dtrain-logloss:0.14292	eval-logloss:0.14281

Thank you for sharing, opened an issue on github https://github.com/dmlc/xgboost/issues/8901

Thanks for looking into it!