I’m running XGBoost algorithm through OneVsRestClassifier, training on large sparse data, around 489 binary features, 484 labels (set of pixels + some binary features). Training takes 40 minutes when predicting - 30 (size of train and test data is the same). Is it okay behavior for that or it’s my fault and I need to set another hyperparameters?
eta = 0.03
gamma = 1
max_depth = 4
n_estimators = 100
tree_method = ‘hist’
predictor = ‘cpu_predictor’ (I’m not allowed to use GPU accelerator)
I use csr matrix to reduce training time. But I noticed that using non-sparse data takes 4 times less time for prediction, but training time becomes too long, and unfortunately I can’t use DMatrix because of OneVsRestClassifier