[XGBoost4J-Spark] Only latest checkpoint is saved

iBalag · November 3, 2022, 1:48pm

Hi,

I tried to save intermediate models (checkpoints) in some folder, but, it seems, only the latest checkpoint is saved.

Environment:

Spark 2.4.1
Linux
xgboost4j-spark_2.11:1.1.2
Python wrapper (for pyspark) - spark-xgboost

Code example (python):

from sparkxgb import XGBoostClassifier


checkpoint_path = "path/to/local/folder"
xgb_params = dict(
        eta=0.1,
        colsampleBytree=0.3,
        gamma=0.0,
        maxDeltaStep=0.0,
        minChildWeight=0.0,
        subsample=0.5,
        maxDepth=15,
        missing=0.0,
        objective="binary:logistic",
        numRound=100,
        numWorkers=2,
        checkpointInterval=20,
        checkpointPath=checkpoint_path
    )

xgb = (
    XGBoostClassifier(**xgb_params)
        .setFeaturesCol("features_vector")
        .setLabelCol("label")
        .setSkipCleanCheckpoint(True)
)

model = xgb.fit(train_df)

I expect to see 5 checkpoint files, but I see only 1:

user1@batlaptop:~$ ls -la ./checkpoints/
total 4564
drwxr-xr-x. 2 user1 users    4096 Nov  3 13:38 .
drwxrwxrwx. 8 root   root    4096 Nov  3 13:37 ..
-rw-r--r--. 1 user1 users 4626215 Nov  3 13:38 160.model
-rw-r--r--. 1 user1 users   36152 Nov  3 13:38 .160.model.crc

Naming of model file (160.model) confused me as well I run only 100 rounds…
Could you please help me with it?