QuantileDMatrix construction and temporary copies

Hi, I’m constructing a QuantileDMatrix from 75GB of data in numpy arrays, and seeing RAM usage reach ~215GB, which conforms to what was mentioned in this thread re: copying numpy data to DMatrix.

Is this still expected when constructing QuantileDMatrix? I was expecting the memory usage to be smaller when constructing QuantileDMatrix vs. DMatrix.

Hi, I think it’s best to use a profiler to inspect the memory usage. The memory of QDM should be relatively low compared to DM based on our profiling result. But we are on the Python land here, maybe the garbage collector hasn’t kicked in, or maybe the numpy data is object type so that we have to make a copy, we can’t say for sure.