CV Shuffling improves performance a lot

DeastinY · September 27, 2022, 12:27pm

I’ve got a timeseries of multiple parameters, that I’m converting to a static problem by using a moving window which I average over, such that I get a vector with the means of e.g. the last day and my prediction target.

cv = StratifiedKFold(10, shuffle=False)
xgb.cv(dtrain=dtrain, folds=cv, params=params)

Using unshuffled CV I get an ROC AUC of 0.7.

cv = StratifiedKFold(10, shuffle=True)  # this is the only difference
xgb.cv(dtrain=dtrain, folds=cv, params=params)

When I shuffle I do get an ROC AUC of 0.9.

It’s not dependant on the random_state for shuffling either, the results are reliably better when shuffling. From my understanding

XGB should be invariant to the order of the training data and
XGB assumes iid, but I’ve seen quite some paper use the same approach

Help please, I’m out of ideas.

Somehow related:

CV Shuffling improves performance *a lot*

CV Shuffling improves performance a lot