Xgboost4j-spark fails with OOM errors

kumar · August 12, 2019, 4:58pm

Hello!

I am training with xgboost on spark. My dataset is ~120GB with a large number of features (~1500). Training fails in the forEachPartition stage after repartitioning for each eval set.

I looked at this answer: [jvm-packages] xgboost4j external memory explain and tried the approach that the author posted there, but it didn’t help.

Any ideas?

Thanks!