Training thousands of XGBoost models in parallel

Hi all,

I am looking to train thousands of XGBoost models in parallel. For my specific use case, it is more efficient to have each model be trained single-threaded. Right now, I am parallelizing in Python using multiprocessing. Roughly:

with concurrent.futures.ProcessPoolExecutor() as pool:
    pool.map(train_model, x_list)

I know that Python is not the best at parallelism, and for other tasks I have gotten big speedups by instead parallelizing in a lower-level language like Rust. It looks like XGBoost doesn’t have an official Rust binding, but I was considering trying to write my code in C instead. Unfortunately, I barely know C at all, so before I jumped into learning a new language, I was wondering if anyone has experimented with this and if it is likely that re-writing my loop in C would bring non-trivial speedups.

Thanks a lot!

I had the exact same issue (slow training of many small models) but luckily found a great way around it. I used Ray. See Ray Core Quickstart -> Core: Parallelizing Functions with Ray Tasks. In my case it gave me an enormous speedup training hundreds of small models at the same time vs training them in serial. IME it gives the best speedup when all models train for around the same amount of time (early stopping or other factors might slow it down compared to training in serial).
I did not try Dask or other frameworks, but I imagine they would work similarly well.
Note for my own case I did not specify the number of threads to use for each model. Just letting it figure out on its own how to handle the dynamic workload seems to work fine.