Parameter tuning with Cross Validation and early stopping

eduardcmp · October 15, 2021, 11:13am

Hello,

I am performing hyperparameter tuning and I want to use early stopping. For this prupose I’ve created a parameter grid and I’m looping over it and using the built-in method xgb.cv

This method accepts early stopping but I’m wondering how it uses it internally. I’m afraid it doesn’t split the data in train, val and test for each of the K-folds as it should be done and instead it only splits it in train and test as it’s usually done for CV.

Thanks in advance

wissam124 · October 19, 2021, 11:33am

Hi,

I feel like there is some confusion here.

xgboost.cv has a parameter nfold which will automatically split your data into the desired numbers of training and validation folds (with the split randomness controlled by the passed seed argument). You can optionnally specify your own train/val folds using the folds parameter. Please note that there are no “test sets” here so not sure what you mean by that in your question.

So when you’re doing your loop, for each combination of parameters from your grid, xgboost.cv will calculate, for each boosting round, the cross-validation error across the folds you have specified. If you want early stopping, it just means that the iteration across boosting rounds will stop when there is no improvement over the number of boosting rounds specified for early stopping.