Scala spark xgboost v0.81 checkpoint issues


#1

To community users and developers,

I am using checkpoint during training for distributed scala spark version, how saw three issues:

  1. By document, checkpointPath should be hdfs, however it only accept LOCAL location;
  2. I can see checkpointed models in specified LOCAL path, with names like 100.model, 200.model,…; however my checkpoint interval is 50, which looks like model name numbers always look like 2*x.model, where x=checkpointinterval
  3. the training will not continue by picingk-up latest file *.model, it will always re-start from the beginning.

Can anyone help me answer questions?

Thanks
Yao


#2
  1. Did you compile XGBoost4J with HDFS? The JAR files from Maven Central doesn’t have HFDS enabled. To use HDFS, you’ll need to compile JAR from the source yourself.

#3

@hcho3,

Thanks, we didn’t compile against HDFS because our IT team failed in compilation for CDH many times (including .cc file errors). We spent lots of effort to ask them accept pre-compiled version. However, this local or hdfs location doesn’t affect too much.

Yao


#4

@yaozhang2016 #2 and #3 may be a bug. Can you create a new post in https://github.com/dmlc/xgboost/issues? Make sure to put [jvm-packages] at the beginning of the title.


#5

@hcho3
a new post was created at https://github.com/dmlc/xgboost/issues/4103

Thanks
Yao