I have been loading a pandas Dataframe from a parquet file, splitting it into train and test sets and loading that into DMatrix. The original file is around 5 or 6GB and when I load it into a DMatrix the memory usage spikes up to 15 to 20GB momentarily.
Is there a better way to store / load my dataset and slice it to prevent the memory spike? My problem is that soon I will run out of memory as I only have 25GB and am already getting crashes sometimes.