When using xgb.cv with gpu_hist: "Check failed: Slice only supported for SimpleDMatrix currently."

I am trying to run an xgb.cv() with tree method: “gpu_hist”. I have pre-defined fold indices, so I created the list of k tuples as (in_list, out_list) as instructed in the xgboost documentation.

I have also loaded the dtrain following the example here: tutorials/external_memory.rst. So, the data is cached (Otherwise I was running out of GPU memory, even though data is not that big about 8 mln rows, 40 columns).

When I run xgb.cv() I’m getting the following error:

XGBoostError Traceback (most recent call last)
1 res = xgb.cv(xgb_params, dtrain, num_boost_round=50, folds=folds,
----> 2 metrics=(‘auc’), verbose_eval=True)

~\scoop\apps\anaconda3\2020.02\lib\site-packages\xgboost\training.py in cv(params, dtrain, num_boost_round, nfold, stratified, folds, metrics, obj, feval, maximize, early_stopping_rounds, fpreproc, as_pandas, verbose_eval, show_stdv, seed, callbacks, shuffle)
461 results = {}
462 cvfolds = mknfold(dtrain, nfold, params, seed, metrics, fpreproc,
–> 463 stratified, folds, shuffle)
465 # setup callbacks

~\scoop\apps\anaconda3\2020.02\lib\site-packages\xgboost\training.py in mknfold(dall, nfold, param, seed, evals, fpreproc, stratified, folds, shuffle)
320 for k in range(nfold):
321 # perform the slicing using the indexes determined by the above methods
–> 322 dtrain = dall.slice(in_idset[k])
323 dtest = dall.slice(out_idset[k])
324 # run preprocessing on the data set if needed

~\scoop\apps\anaconda3\2020.02\lib\site-packages\xgboost\core.py in slice(self, rindex, allow_groups)
940 c_bst_ulong(len(rindex)),
941 ctypes.byref(res.handle),
–> 942 ctypes.c_int(1 if allow_groups else 0)))
943 return res

~\scoop\apps\anaconda3\2020.02\lib\site-packages\xgboost\core.py in _check_call(ret)
187 “”"
188 if ret != 0:
–> 189 raise XGBoostError(py_str(_LIB.XGBGetLastError()))

XGBoostError: [17:01:33] C:\Users\user\xgboost\src\c_api\c_api.cc:184: Check failed: dynamic_cast<data::SimpleDMatrix*>(dmat): Slice only supported for SimpleDMatrix currently.

It looks like slicing isn’t possible with the cached dtrain.
I tried to manually run
and it gave the same error.

Has anybody run into this issue or perhaps have an idea of what’s going on and how to fix it?

Thank you!

As the error message says, slicing is currently disabled when you turn on external memory. You could perhap manually implement cross-validation?

hmm…I see. Thank you, hcho3!
The problem in that case is the GPU is running out of memory.

As a separate experiment, I’m running a simple hyperparameter tuning with scikit-optimize on a sample of the data (less than 23MB) and even then, for some reason, I’m getting out of memory error:

Message=[17:56:20] C:/…/xgboost/src/tree/updater_gpu_hist.cu:994: Exception in gpu_hist: parallel_for failed: cudaErrorMemoryAllocation: out of memory

I’m baffled as to why such a small dataset is exploding to over 12GB (my GPU memory).

What is your value for max_depth? Ensure that this value is not too high. I’d recommend at most 6 for this value.

:slight_smile: bingo!
I was monitoring the gpu memory as the skopt iterations were progressing, and indeed saw that when the max_depth hyperparameter was large the memory was shooting up like a rocket. We set the max_depth range to 10 - 30, but you’re saying it should be 6 at most. I’ve seen models with around 20 max_depth, though.