Does xgboost on dask support iterative out-of-core training?

I have had trouble determining if out-of-core training is possible with Distributed XGBoost with Dask. By out of core I mean iterative loading and training on, for example, a dataset that is larger than the available RAM on a single worker. I created an example below that is similar to the training example in the XGBoost dask documentation.

I attempted to run the below on a local cluster with a single worker limited to 500MB of memory. The total dataset is ~500MB and I have created 20 chunks (10 for each array) so about 50MB per partition. Naively, I expected that each partition would be loaded iteratively on the worker and xgboost would train on a single partition, save the result back to disk and then collect all results to create a single booster. However, when experimenting with this the creation of the DaskDMatrix hang, the worker fails and gets restarted repeatedly. From reading the XGBoost docs it is unclear to me if it is possible to train out-of-core with Dask as I am attempting or if I need to add additional workers such that my entire dataset (and intermediate DMatrix) can fit into the combined RAM of all my workers. (Note: I am actually interested in running on a cluster, but wanted to run a simple experiment to determine if OOC training is possible).

import dask.array as da
from dask import persist
from dask.distributed import Client 
import xgboost as xgb

client = Client(<address>)
client 

num_obs = 1e6
num_features = 50
chunk_size = 1e5
X = da.random.random(size=(num_obs, num_features), chunks=(chunk_size, num_features))
y = da.random.random(size=(num_obs, 1), chunks=(chunk_size, 1))

dtrain = xgb.dask.DaskDMatrix(client, X, y)

output = xgb.dask.train(
        client,
        {"verbosity": 2, "tree_method": "hist", "objective": "reg:squarederror"},
        dtrain,
        num_boost_round=4,
        evals=[(dtrain, "train")],
        )

Unfortunately, no.

I tried to implement it but the complexity might be out of xgboost’s current scope. To avoid loading all data into memory, we need to iterate all chunks from dask on each worker while keeping dask from loading unnecessary chunks. To accomplish something like this we will have to reach into dask internal and check its memory management, which is unlikely for a machine learning library like xgboost to do.