Memory overflow in python CV

I want to do XGB regression. The shape of the feature matrix is (894, 1518). I am using a 12GB NVidia GPU. I run the XGBRegressor train and predict within a homemade KFold loop, and it seems to work fine. The parameters I use are

xgbParams = {"objective":"reg:squarederror", "tree_method": "gpu_hist",
             'colsample_bytree': 1,'learning_rate': 0.01, 
             'subsample': 1.0,
             'max_depth': 9, 'gamma': 0, 
             'reg_alpha': 0.001, 'reg_lambda': 1,
             'min_split_loss': 0.0, 'min_child_weight': 8, 'scale_pos_weight': 1}

The GPU memory usage reaches about 2 GB at worst, so within tolerance limits. However, now I want to do xgboost.cv() in order to determine the optimal number of estimators. I use the following parameters

xgb_cv = cv(dtrain=data_dmatrix, params=xgbParams, nfold=10, num_boost_round=5000, early_stopping_rounds=10, metrics=["mae"], as_pandas=True)

When I try to use the natural definition data_dmatrix = xgb.DMatrix(data=features, label=labels), the CV crashes with memory overflow when it attempts to reserve 20GB of GPU memory. Why is CV so memory-hungry??? I would have expected that it just does a bunch of consecutive regressions in the background. I have tried to follow the suggested procedures to improve memory usage, but to no avail. In particular:

  • Using QuantileDMatrix does not work. I have tried data_dmatrix = xgb.QuantileDMatrix(data=features, label=labels), but then cv crashed with ../src/data/iterative_dmatrix.h:86: Slicing DMatrix is not supported for Quantile DMatrix.
  • Using external memory does not work. I have tried data_dmatrix = xgb.DMatrix('foo.csv?format=csv&label_column=0#dtrain.cache'), but then cv crashed with ../src/data/./sparse_page_dmatrix.h:107: Slicing DMatrix is not supported for external memory.
  • Using subsample less than one (I tried 0.5 and 0.3) and 'sampling_method': 'gradient_based' seems to improve the memory usage, but after a while CV still crashes with memory overflow.

I would appreciate if you could tell me the intended way of finding the optimal number of estimators, and what I can do to reduce memory footprint on GPU.