Xgboost4j-spark fails with OOM errors



I am training with xgboost on spark. My dataset is ~120GB with a large number of features (~1500). Training fails in the forEachPartition stage after repartitioning for each eval set.

I looked at this answer: [jvm-packages] xgboost4j external memory explain and tried the approach that the author posted there, but it didn’t help.

Any ideas?