Scala spark xgboost v0.81 checkpoint issues

yaozhang2016 · February 1, 2019, 9:43pm

To community users and developers,

I am using checkpoint during training for distributed scala spark version, how saw three issues:

By document, checkpointPath should be hdfs, however it only accept LOCAL location;
I can see checkpointed models in specified LOCAL path, with names like 100.model, 200.model,…; however my checkpoint interval is 50, which looks like model name numbers always look like 2*x.model, where x=checkpointinterval
the training will not continue by picingk-up latest file *.model, it will always re-start from the beginning.

Can anyone help me answer questions?

Thanks
Yao

hcho3 · February 2, 2019, 12:06am

Did you compile XGBoost4J with HDFS? The JAR files from Maven Central doesn’t have HFDS enabled. To use HDFS, you’ll need to compile JAR from the source yourself.

yaozhang2016 · February 5, 2019, 2:02pm

@hcho3,

Thanks, we didn’t compile against HDFS because our IT team failed in compilation for CDH many times (including .cc file errors). We spent lots of effort to ask them accept pre-compiled version. However, this local or hdfs location doesn’t affect too much.

Yao

hcho3 · February 5, 2019, 6:50pm

@yaozhang2016 #2 and #3 may be a bug. Can you create a new post in https://github.com/dmlc/xgboost/issues? Make sure to put [jvm-packages] at the beginning of the title.

yaozhang2016 · February 5, 2019, 7:49pm

@hcho3
a new post was created at https://github.com/dmlc/xgboost/issues/4103

Thanks
Yao