I’m using a manual CV loop to tune booster parameters (this is at the same time as tuning vectoriser parameters, so I can’t use xgboost’s cv
function).
I’m using an eval set for each CV fold to try and choose a good number of estimators for the model using the best_ntree_limit
attribute.
These vary a lot in each iteration though, e.g. for 5-fold CV I’m sometimes seeing a wide range of best_ntree_limit
values, e.g.: 7, 29, 13, 72, 14.
I’m wondering if there is any suggestion on choosing a value to use for my final model? E.g. I could take the mean or max value, but wondering if there was any better recommendation (or maybe this high variance indicates that there’s some other changes I should be making).