[jvm-packages] xgboost4j external memory explain

jackie930 · January 15, 2019, 3:34am

I was using xgboost4j on yarn on a quite large dataframe. ~500G. And I found that when I used quite low executor-memory, high external-memory setting, the program ran succesffuly. Which is not the case the other way, i.e., low external memory, high executor-memory setting.

Thus, I am wondering what’s the reason behind? I didn’t find much materials available on explain external memory online. Could someone help with this?

Thanks!

thvasilo · January 23, 2019, 11:44am

I’m kind of rusty on my Spark nomenclature? What are the differences between executor memory and external memory?

jackie930 · January 24, 2019, 2:31am

I am kind of new to this. But I did several experiments and with same total memory size. The program works fine with large external memory setting and relatively low executor memory, but failed the other way. Thus I suppose the XGboost program is leveraging external memory to do training since it’s written in C++. But that’s only my guessing. Thus I hope can get some expert explain here.

thvasilo · January 24, 2019, 3:39pm

Hello Jackie, looking at Spark’s configuration settings I cannot find a reference to an external memory setting.

AFAIK XGBoost will make use of the memory assigned to it, and that should be taken from the executor memory pool. This basically determines the size of the JVM that the process is launched in.

Could you link to your output? It’s likely that the error originates from Spark and not XGBoost. Perhaps @CodingCat knows more about this.

jackie930 · January 25, 2019, 5:54am

Hi thvasilo,

My setting below: I think the useExternalMemory setting with the spark.executor.memoryOverhead setting work together. My error message you can refer to

val xgbParam = Map(“eta” -> 0.1f,
“max_depth” -> 7,
“objective” -> “reg:linear”,
“eval_metric”->“rmse”,
“num_round” -> 100,
“num_workers” -> 300,
"useExternalMemory" -> true,
“nthread” -> 5,
“subsample” -> 0.8,
“seed” -> 100,
“timeout_request_workers” -> 60000L,
“colsample_bytree” -> 0.8
)

spark-submit --master yarn-cluster
–num-executors 300
–driver-memory 10g
–executor-memory 10g
–executor-cores 10
–conf spark.task.cpus=5
*–conf spark.executor.memoryOverhead=20480 *
–conf spark.debug.maxToStringFields=100
–conf spark.dynamicAllocation.enabled=true
–conf spark.dynamicAllocation.minExecutors=300
–conf spark.dynamicAllocation.executorIdleTimeout=3600s
–conf spark.shuffle.service.enabled=true
–class xgboost_test xgboost_test.jar

thvasilo · January 25, 2019, 9:52am

Ah OK so you’re using the external memory, or out-of-core, version of XGBoost (see here).

This tries to not load the complete dataset into memory but rather use a disk-backed cache, thereby enabling training with limited memory at the cost of runtime performance.

If this works well for you it’s all good, otherwise you can try disabling that and increasing the executor-memory to load the complete dataset into memory.

EDIT: Also, I’m not seeing the parameter "useExternalMemory" documented anywhere, AFAIK using external memory is done like this example. Could you point me to where you found that parameter?

jackie930 · January 25, 2019, 10:04am

Yes，that works well.
I am using scala version XGBoost4J-Spark 0.81 and I downloaded the source scala code with it. This parameter came to me when I saw the XGboostRegressor.scala, also the error log info was saying that the exceeding memory limits, I should try memoryoverhead. Thus I tried this way, luckily, it works:)

kumar · August 7, 2019, 6:32pm

did you figure out the reason why it didn’t work with high executor memory and low external memory?