[XGBoost4J-Spark] Number of columns does not match number of features in booster

kusumakarb · September 17, 2020, 11:40am

While training XGBoost model on spark using the XGBoost4J-Spark, for some of the datasets I see the following warning in the spark executor logs

WARNING: /xgboost/src/learner.cc:979: Number of columns does not match number of features in booster. Columns: 7531 Features: 7535

Most of the times the training gets stuck after this warning and it doesn’t progress any further. Any idea what I could be doing wrong here or is it the dataset ?