XGBoost long predicting time

Alkatrass · July 18, 2020, 6:08pm

I’m running XGBoost algorithm through OneVsRestClassifier, training on large sparse data, around 489 binary features, 484 labels (set of pixels + some binary features). Training takes 40 minutes when predicting - 30 (size of train and test data is the same). Is it okay behavior for that or it’s my fault and I need to set another hyperparameters?

Language: Python3

Current hyperparameters:

eta = 0.03
gamma = 1
max_depth = 4
n_estimators = 100
tree_method = ‘hist’
predictor = ‘cpu_predictor’ (I’m not allowed to use GPU accelerator)

I use csr matrix to reduce training time. But I noticed that using non-sparse data takes 4 times less time for prediction, but training time becomes too long, and unfortunately I can’t use DMatrix because of OneVsRestClassifier

thvasilo · July 22, 2020, 9:28pm

OneVsRestClassifier will train one classifier per class, so unless I’m mistaken, you’re training 484*100 trees here.

That would probably explain the long running time. Its purpose is to be used for multilabel classification, if you’re using images (pixels) not sure if that’s the correct model.

Are you trying to use 489 features to predict from a set of 489 labels? If your output is an image, there’s probably better ways to model that problem and take advantage of the structure present in images.