CV Shuffling improves performance *a lot*

I’ve got a timeseries of multiple parameters, that I’m converting to a static problem by using a moving window which I average over, such that I get a vector with the means of e.g. the last day and my prediction target.

cv = StratifiedKFold(10, shuffle=False)
xgb.cv(dtrain=dtrain, folds=cv, params=params)

Using unshuffled CV I get an ROC AUC of 0.7.

cv = StratifiedKFold(10, shuffle=True)  # this is the only difference
xgb.cv(dtrain=dtrain, folds=cv, params=params)

When I shuffle I do get an ROC AUC of 0.9.

It’s not dependant on the random_state for shuffling either, the results are reliably better when shuffling. From my understanding

  1. XGB should be invariant to the order of the training data and
  2. XGB assumes iid, but I’ve seen quite some paper use the same approach :thinking:

Help please, I’m out of ideas. :sweat_smile:

Somehow related: