[jvm-packages] potential checkpoint issues


#1

To community users and developers,

I am using checkpoint during training for distributed scala spark version, how saw potential two issues:

  1. I can see checkpointed models in specified LOCAL path, with names like 100.model, 200.model,…; however my checkpoint interval is 50, which looks like model name numbers always look like 2*x.model, where x=checkpointinterval

  2. the training will not continue by picking-up latest file *.model, it will always re-start from the beginning, just like ignoring *.model files in checkpointPath.

Version info: pre-compiled scala spark xgb version 0.81-criteo-20180821 on CDH 2.3.1
Since this version is not compiled with HDFS on, so it only accepts local file path. All above discussion is based on .setCheckpointPath(localPath).

Thanks
Yao


#2

Already opened a ticket at https://github.com/dmlc/xgboost/issues/4103


#3

Closing this as duplicate