Possible to train XGBoost with multiple GPUs and external memory?

Hi folks,

I have multiple .npy files on the slurm cluster containing numpy arrays obtained from my research partner. Each file has a varying number of samples represented by rows and features/labels by columns. I typically load all the files and concatenate them, but this takes a lot of memory. I discovered that XGBoost could load data in an external memory fashion, a bit similar to PyTorch’s dataloader for me. I want to train an XGBoost model on the slurm server using multiple GPUs in a distributed manner. However, xgb.dask.DaskDMatrix seems not to support iterators like xgboost.DMatrix. Is it possible to load the data iteratively while training with multiple GPUs? I would appreciate any suggestions. Thanks!