Out-of-memory when using sparse matrix (python)

cxz · September 11, 2019, 4:32pm

I’m using the following test script:

gist.github.com

https://gist.github.com/cxz/b9c43be28ff2ce5a5b615b566dd8c647

output.txt

$ python test.py
0.90 1.2.1
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: out of memory

test.py

import numpy as np
from scipy.sparse import csr_matrix
import xgboost as xgb 

SAMPLES = 1_000_000
SPARSE_FEATURES = 300 #300_000
NUM_CLASS = 2000

param = {
    'objective': 'multi:softmax',

This file has been truncated. show original

When increasing SAMPLES from 100_000 to 1_000_000 it crashes with the following message:
terminate called after throwing an instance of 'thrust::system::system_error’
** what(): parallel_for failed: out of memory**

(it’s using ~19G, nowhere near the total memory available).

XGBoost v0.90

Any ideas on how should I approach debugging this?

Thanks,

thvasilo · September 12, 2019, 9:07am

Thrust is a GPU library so I’m assuming you are trying to train on the GPU, hence it’s running out of GPU memory.

cxz · September 12, 2019, 3:18pm

I was not trying to use GPUs, didn’t know it would try to use GPUs by default with the above test script.Thanks for the pointer.