Xgb on yarn foreach partition error



I’m trying to use XGboost on a quite large Dataset (~500G) on Yarn, and keep getting below error when running stage-1 foreach partition after successfully run stage 0-repartition and will keep retry step 0.

I’m wondering is this because of the program doesn’t get enough resource from the cluster?

[jvm-packages] xgboost4j external memory explain

Had the same error, but got no clue yet. See [java-scala]Is XGBoost-Spark training thread-safe?
Hi jacob,

I solved my problem by setting larger memory on each task. I think you can try to either increase your memory per executor/ memory overhead, or decrease the number of tasks running parallel.