Distributed XGBoost with seed fixed behavior


Share a known trick after debug with internal customer.

If you were using pyspark or xgboost4-spark and having issue with reproduce results despite fixed seed. It’s most likely due to the way each job paritioned dataset plus we run approximate tree method.

More detail track here.