Dask XGBoost incremental learning

simone · September 27, 2023, 3:31pm

Hi,

I’m using the xgboost.dask.train API to incrementally train a classifier. In the new training steps, I use the xgb_model parameter passing the previously returned model instance.

I see that sometimes the last training steps show an higher validation loss, leading to a poor overall accuracy.

This training should be actually building new trees, instead of updating the previous built.
Online I see folks using these params: ‘update’:‘refresh’, ‘process_type’: ‘update’, ‘refresh_leaf’: True, but with xgboost.dask.train I’m encountering troubles may be because they not support the distributed training.

Any suggestion?