Xgboost CPU multithreading

gregoryb · May 22, 2024, 8:21am

Hi,
I wonder how can I limit CPU cores number used by xgboost in predict and training

jasonb · May 23, 2024, 10:26pm

There are a few ways to limit the number of threads used by xgboost (n_jobs and OMP_NUM_THREADS) and a few use cases (training, inference, hyperparameter optimization):

During training you can limit the number of cores by specifying the n_jobs parameter (or the older nthread alias) to the number of cores to use.

For example:

# Initialize the XGBoost classifier with a specific n_jobs value
model = XGBClassifier(n_jobs=4, ...)

Try the number of logical or physical cpus. The default is n_jobs=-1 which uses the number of logical cores in the system.

This example will help:

Configure XGBoost “n_jobs” Parameter

During inference (prediction) you can fix the number of OpenMP/BLAS threads via the OMP_NUM_THREADS environment variable. You can set this in code prior to any import statements:

import os
os.environ["OMP_NUM_THREADS"] = "1"

This example will help:

XGBoost Configure “OMP_NUM_THREADS” for Inference

Sometimes, you want to train and use an XGBoost model in a single-threaded manner, in which case you can set both the n_jobs parameter when configuring the model and OMP_NUM_THREADS before importing libs.

This example will help for running xgboost in a single-threaded manner:

XGBoost Single-Threaded Training and Prediction (no threads)

If you’re using a grid search to tune xgboost hyperparameters, it is a good idea to set the number of threads for each single model to 1 or 2 via n_jobs and then allow the grid search itself to run concurrently with all available cores, e.g. n_jobs=-1.

Generally each model will not occupy a core fully, so we can have 2x or more models training concurrently than physical or logical cores. Limiting the number of threads for inference is also a good idea during the grid search.

This example will help for configuring the number of threads for a grid search:

XGBoost Configure n_jobs for Grid Search

This example shows how model training uses less than 100% of a given CPU core:

XGBoost CPU Usage Below 100% During Training