Parallel jobs with Python api (n_jobs, nthread)

kacperw · June 6, 2020, 10:26am

I have a couple of questions related to parallelism in the Python api. I’m using XGBRegressor() with ‘tree_method’: ‘hist’ and other default parameters in a kaggle cpu kernel (4 cores) on the Ames Housing dataset (~80 features, ~1500 samples). I run a couple of tests (100 iterations for each condition). It seems the n_jobs parameter (left panel) doesn’t make any difference to fit time (y axis) until we specify n_jobs greater then the number of available cores:

Does it mean that n_ jobs will be set to the max number of available cores automatically (unless we set it to something greater than the number of cores)?

I also tested the same range of values for the ‘nthread’ parameter (right panel). Here it seems that it actually makes a difference and performance is best for the maximum number of available cores (4).
Could anyone explain the different behaviour of the two parameters? Which of the two parameters should we use in the Python api?

hcho3 · June 7, 2020, 3:45am

Note that n_jobs should be supplied to the constructor of XGBRegressor, not to the parameter dictionary. https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBRegressor

kacperw · June 8, 2020, 7:45am

Thanks for your reply Philip. I wasn’t aware of that. So to clarify, all the arguments for the scikit-learn api XGBRegressor() listed here https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBRegressor I pass in the constructor, but how about arguments that are not listed in the scikit learn api but are available in the general xgboost parameter list here: https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst? For instance, the scikit-learn api let’s you pass the tree_method parameter but not the grow_policy. There is the **kwargs parameters dictionary but it says that “**kwargs is unsupported by scikit-learn”. Would really appreciate any help on that.

kacperw · June 8, 2020, 8:05am

and how about the set_params( **params) method? the description is not very clear.

thvasilo · June 9, 2020, 4:49pm

My understanding is that at booster construction time you can either use the sklearn named constructor arguments that are guaranteed to work, or use kwargs to set the underlying booster parameters, but without guarantees that these will get propagated to the underlying learner.

Perhaps taking a look at the XGB sklearn code would help you. Agreed that it’s not very clear right now, you can open an issue on the repo to note this to developers.