Hello!
I am training with xgboost on spark. My dataset is ~120GB with a large number of features (~1500). Training fails in the forEachPartition stage after repartitioning for each eval set.
I looked at this answer: [jvm-packages] xgboost4j external memory explain and tried the approach that the author posted there, but it didn’t help.
Any ideas?
Thanks!