Im using distributed xgb on yarn, not xgb on spark.
According to the distributed-training demo, I think everytime I submit a job, I can only do training and testing on 1 set of parameters, specified in a config file. The running script I’m using now:
$XGB_HOME/dmlc-core/tracker/dmlc-submit --cluster=yarn --num-workers=4 --worker-cores=2 --worker-memory=80g --server-memory=40g\
$XGB_HOME/xgboost $(pwd)/../conf/xgb.conf
and the content of xgb.conf:
booster = gbtree
eta = 0.1
objective = binary:logistic
...
Is there any way to do cross validation on xgb on yarn?