The feature is still experimental and not yet ready for production use

parth-lth · November 2, 2023, 7:54pm

https://xgboost.readthedocs.io/en/stable/tutorials/spark_estimator.html

The documentation here says that “The feature is still experimental and not yet ready for production use.”
It has been a while since this comment was added. Is it still valid or can I use it? My data size is very big and I need to train distributed XGBoost

hcho3 · November 2, 2023, 10:48pm

The Scala Spark interface has been around for a while, but the PySpark interface (Python) is newer, hence the remark.

parth-lth · November 3, 2023, 5:36am

We need to use this for training, not in production. We wanted to get predicted probabilities using cross validation, with a distributed XGB estimator. What will be the risks involved in using this?

hcho3 · November 3, 2023, 6:43am

There can be unknown bugs that can either affect model accuracy, for example. Any latest code will have issues that are later discovered.