Parallel threading with xgboost?


#1

I’m having problems using all cores on computer for training and cross-validation of XGBoost model.

Data:

data_dmatrix = xgb.DMatrix(data=X,label=y, nthread=-1)
dtrain = xgb.DMatrix(X_train, label=y_train, nthread=-1)
dtest = xgb.DMatrix(X_test, label=y_test, nthread=-1)
Model:

xg_model = XGBRegressor(objective=‘reg:linear’, colsample_bytree= 0.3, learning_rate = 0.2,
max_depth = 5, alpha = 10, n_estimators = 100, subsample=0.4, booster = ‘gbtree’, n_jobs=-1)
and than if I do model training with:

xgb.train(
xg_model.get_xgb_params(),
dtrain,
num_boost_round=500,
evals=[(dtest, “Test”)],
early_stopping_rounds=200)
It works ok but it uses only 1 thread to run xgboost. Processor is on 25%. It ignores n_jobs=-1

But if I do cross-validation with scikit-learn implementation:

scores = cross_val_score(xg_model, X, y, cv=kfold, n_jobs=-1)

than it uses all cores. How can I force xgb.train and xgb.cv to use all cores?


#2

Which version of XGBoost are you using?

The default behavior is to use all threads available. Did you try removing all other mentions of “nthread”?

To set parameters in the learner, use a dictionary like shown in the docs. For example to use 4 threads you would write something like

param = {'max_depth': 5, 'eta': 0.2, 'objective': 'reg:linear'}
param['nthread'] = 4

num_round = 10
bst = xgb.train(param, dtrain, num_round)