@VigneshN1997 In 0.82, there was no convenient wrapper for provisioning worker processes with GPU allocation. You had to manually invoke the DMLC tracker script like this:
While 0.82 contained an initial support for multi-GPU training, it was quite difficult to set up in practice.
Starting with 1.0.0, XGBoost provides a seamless integration with Dask, so that it’s now easy to provision workers from the main process:
with LocalCUDACluster(n_workers=2) as cluster:
with Client(cluster) as client:
dask_df = dask.dataframe.read_csv(fname, header=None, names=colnames)
X = dask_df[dask_df.columns.difference(['label'])]
y = dask_df['label']
dtrain = xgb.dask.DaskDMatrix(client, X, y)
output = xgb.dask.train(client,
{'tree_method': 'gpu_hist'},
dtrain,
num_boost_round=100,
evals=[(dtrain, 'train')])
In addition, 0.82 used to support using multiple GPUs from a single process, whereas 1.0.0 drops that support: https://github.com/dmlc/xgboost/issues/4531. The rationale is that managing multiple GPUs from a single process introduces lots of complex code path, and enforcing 1:1 assignment between GPUs and processes greatly simplifies the code. Currently, if you have N GPU cards in your machine, you’d need to provision N worker processes. This is easy to do using the LocalCUDACluster
abstraction.