[jvm-packages] potential checkpoint issues

yaozhang2016 · February 5, 2019, 7:42pm

To community users and developers,

I am using checkpoint during training for distributed scala spark version, how saw potential two issues:

I can see checkpointed models in specified LOCAL path, with names like 100.model, 200.model,…; however my checkpoint interval is 50, which looks like model name numbers always look like 2*x.model, where x=checkpointinterval
the training will not continue by picking-up latest file *.model, it will always re-start from the beginning, just like ignoring *.model files in checkpointPath.

Version info: pre-compiled scala spark xgb version 0.81-criteo-20180821 on CDH 2.3.1
Since this version is not compiled with HDFS on, so it only accepts local file path. All above discussion is based on .setCheckpointPath(localPath).

Thanks
Yao

yaozhang2016 · February 5, 2019, 7:52pm

Already opened a ticket at https://github.com/dmlc/xgboost/issues/4103

hcho3 · February 5, 2019, 9:45pm

Closing this as duplicate