I was using xgboost4j on yarn on a quite large dataframe. ~500G. And I found that when I used quite low executor-memory, high external-memory setting, the program ran succesffuly. Which is not the case the other way, i.e., low external memory, high executor-memory setting.
Thus, I am wondering what’s the reason behind? I didn’t find much materials available on explain external memory online. Could someone help with this?
I am kind of new to this. But I did several experiments and with same total memory size. The program works fine with large external memory setting and relatively low executor memory, but failed the other way. Thus I suppose the XGboost program is leveraging external memory to do training since it’s written in C++. But that’s only my guessing. Thus I hope can get some expert explain here.
Yes，that works well.
I am using scala version XGBoost4J-Spark 0.81 and I downloaded the source scala code with it. This parameter came to me when I saw the XGboostRegressor.scala, also the error log info was saying that the exceeding memory limits, I should try memoryoverhead. Thus I tried this way, luckily, it works:)