Can I do cross validation on xgb on yarn?


Im using distributed xgb on yarn, not xgb on spark.

According to the distributed-training demo, I think everytime I submit a job, I can only do training and testing on 1 set of parameters, specified in a config file. The running script I’m using now:

$XGB_HOME/dmlc-core/tracker/dmlc-submit --cluster=yarn --num-workers=4 --worker-cores=2 --worker-memory=80g --server-memory=40g\
     $XGB_HOME/xgboost $(pwd)/../conf/xgb.conf

and the content of xgb.conf:

booster = gbtree
eta = 0.1
objective = binary:logistic

Is there any way to do cross validation on xgb on yarn?


Ok, I guess I confused cross validation with grid search.
But I still get questions on doing distributed cross validation.
When I do distributed cross validation (on a cluster of 4 worker nodes), the output looks like

It seems every node is doing the same work, and it takes as much time as I do it on a single node. Am I doing anything wrong?


Also I have another problem that the precision of eval metric is too high
I only want 6 or 7 decimal numbers. Where can I configure that?