While doing a walk-forward validation to assess the performances of my model, I specified the test set as validation set in the following way:
booster = xgb.train(params,
dmat_train,
evals=[(dmat_train, "train"), (dmat_test, "test")], verbose_eval = False,
num_boost_round=num_boost_round)
preds = booster.predict(dmat_test)
when I do not do it:
booster = xgb.train(params,
dmat_train,
evals=[(dmat_train, "train")], verbose_eval = False,
num_boost_round=num_boost_round)
preds = booster.predict(dmat_test)
performances are significantly worse.
I guess the second way of doing is the right one. But why? How does specifying the test set as cross-validation set actually leads to overfitting in XGBoost? Note that I do not do any hyper-parameter optimizations.