Scala spark xgboost v0.81 checkpoint issues

To community users and developers,

I am using checkpoint during training for distributed scala spark version, how saw three issues:

  1. By document, checkpointPath should be hdfs, however it only accept LOCAL location;
  2. I can see checkpointed models in specified LOCAL path, with names like 100.model, 200.model,…; however my checkpoint interval is 50, which looks like model name numbers always look like 2*x.model, where x=checkpointinterval
  3. the training will not continue by picingk-up latest file *.model, it will always re-start from the beginning.

Can anyone help me answer questions?

Thanks
Yao

  1. Did you compile XGBoost4J with HDFS? The JAR files from Maven Central doesn’t have HFDS enabled. To use HDFS, you’ll need to compile JAR from the source yourself.

@hcho3,

Thanks, we didn’t compile against HDFS because our IT team failed in compilation for CDH many times (including .cc file errors). We spent lots of effort to ask them accept pre-compiled version. However, this local or hdfs location doesn’t affect too much.

Yao

@yaozhang2016 #2 and #3 may be a bug. Can you create a new post in https://github.com/dmlc/xgboost/issues? Make sure to put [jvm-packages] at the beginning of the title.

@hcho3
a new post was created at https://github.com/dmlc/xgboost/issues/4103

Thanks
Yao