I would like more information on the memory spike when creating DMatrix. There doesn’t seem to be any reasoning to how big the spike is - sometimes its 3x my original dataset and sometimes its >10x.
Also I cannot load a parquet file after I create a DMatrix without memory spiking and crashing. Can load the same file beforehand though with no issues.