Training thousands of XGBoost models in parallel

Hi all,

I am looking to train thousands of XGBoost models in parallel. For my specific use case, it is more efficient to have each model be trained single-threaded. Right now, I am parallelizing in Python using multiprocessing. Roughly:

with concurrent.futures.ProcessPoolExecutor() as pool:
    pool.map(train_model, x_list)

I know that Python is not the best at parallelism, and for other tasks I have gotten big speedups by instead parallelizing in a lower-level language like Rust. It looks like XGBoost doesn’t have an official Rust binding, but I was considering trying to write my code in C instead. Unfortunately, I barely know C at all, so before I jumped into learning a new language, I was wondering if anyone has experimented with this and if it is likely that re-writing my loop in C would bring non-trivial speedups.

Thanks a lot!