I am trying to train multiple models on the same cluster, using a thread pool on the driver side.
This creates multiple spark jobs for each model, as expected.
The application often fails, or hangs - and it seems to be something to do with rabit usage on the xgboost library.
Am I doing something wrong?