I’ve got a timeseries of multiple parameters, that I’m converting to a static problem by using a moving window which I average over, such that I get a vector with the means of e.g. the last day and my prediction target.
cv = StratifiedKFold(10, shuffle=False)
xgb.cv(dtrain=dtrain, folds=cv, params=params)
Using unshuffled CV I get an ROC AUC of 0.7.
cv = StratifiedKFold(10, shuffle=True) # this is the only difference
xgb.cv(dtrain=dtrain, folds=cv, params=params)
When I shuffle I do get an ROC AUC of 0.9.
It’s not dependant on the random_state for shuffling either, the results are reliably better when shuffling. From my understanding
- XGB should be invariant to the order of the training data and
- XGB assumes iid, but I’ve seen quite some paper use the same approach
Help please, I’m out of ideas.
Somehow related: