To community users and developers,
I am using checkpoint during training for distributed scala spark version, how saw three issues:
- By document, checkpointPath should be hdfs, however it only accept LOCAL location;
- I can see checkpointed models in specified LOCAL path, with names like 100.model, 200.model,…; however my checkpoint interval is 50, which looks like model name numbers always look like 2*x.model, where x=checkpointinterval
- the training will not continue by picingk-up latest file *.model, it will always re-start from the beginning.
Can anyone help me answer questions?
Thanks
Yao