The preprocessing of data to feed to my xgboost model is being done in SAS and that includes dummifying all categorical features included in modelling. Reason for this is because whenever new data for a new month is scored, I always get the error message the object in the new data is not the same as in the model.
My question is, is there any effect when I still perform sparse.model.matrix on my dataset with all numeric features already? Just using xgb.DMatrix gives me a train/test/valid auc of 1.
I have followed all suggestions here how to control overfitting. Can you help please.