[jvm-packages][spark] Get model dump with feature-map "in cluster-mode"

raskr · March 30, 2019, 9:14am

I’m trying to use Booster#getModelDump(String feaMapPath, ...) to get the model dump with feature-map as described in https://xgboost.readthedocs.io/en/latest/jvm/java_intro.html or https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-example/src/main/scala/ml/dmlc/xgboost4j/scala/example/BasicWalkThrough.scala.
Those examples would run successfully in local-mode by use of adhoc relative-path, but what I want to know is the way to make it “in cluster-mode”.

private void dumpModel(XGBoostRegressionModel model, String path) throws IOException, XGBoostError {
    // Get model dump
    String[] dump = model.nativeBooster().getModelDump(/*featureMap=*/path, false, "json");
}

results in:

Model dump failed: ml.dmlc.xgboost4j.java.XGBoostError: [17:31:52] /xgboost/dmlc-core/src/io/local_filesys.cc:196: Check failed: allow_null  LocalFileSystem::Open "/path/to/featmap.txt": No such file or directory

What does the appropriate path in driver side file system look like?
It is useful If Booster has the ability to recognize HDFS, but I don’t expect much…

raskr · April 1, 2019, 6:17am

Found a solution: just use Booster#getModelDump(String[] featureNames, ...).
Thank you.