Error While Train XGBoostClassifier

Hi All,
I am struggling to run xgboost on our spark cluster.
I’ll be happy with your help.

I have uploaded these jars:
(xgboost4j-spark_2.12-1.0.0.jar,xgboost4j_2.12-1.6.2.jar’),
and the following zip ‘spark.submit.pyFiles’,
‘/tmp/spark-ac8c39c2-0a5c-4e86-8919-534bcde30ae4/pyspark-xgboost_0.90.zip’

I am getting the following Error while trying to train my model:

from sparkxgb import XGBoostClassifier

XGBoostClassifier(featuresCol=“features”,labelCol=“DecisionStatus”)
xgbClassificationModel = xgbClassifier.fit(xgbInput)

The Error:
> Py4JJavaError Traceback (most recent call last)
> Input In [40], in <cell line: 1>()
> ----> 1 xgbClassificationModel = xgbClassifier.fit(xgbInput)
>
> File /usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/base.py:161, in Estimator.fit(self, dataset, params)
> 159 return self.copy(params)._fit(dataset)
> 160 else:
> --> 161 return self._fit(dataset)
> 162 else:
> 163 raise ValueError(“Params must be either a param map or a list/tuple of param maps, "
> 164 “but got %s.” % type(params))
>
> File /usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py:335, in JavaEstimator._fit(self, dataset)
> 334 def _fit(self, dataset):
> --> 335 java_model = self._fit_java(dataset)
> 336 model = self._create_model(java_model)
> 337 return self._copyValues(model)
>
> File /usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/wrapper.py:332, in JavaEstimator._fit_java(self, dataset)
> 318 “””
> 319 Fits a Java model to the input dataset.
> 320
> (…)
> 329 fitted Java model
> 330 “”"
> 331 self._transfer_params_to_java()
> --> 332 return self._java_obj.fit(dataset._jdf)
>
> File /usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py:1304, in JavaMember.call(self, *args)
> 1298 command = proto.CALL_COMMAND_NAME +
> 1299 self.command_header +
> 1300 args_command +
> 1301 proto.END_COMMAND_PART
> 1303 answer = self.gateway_client.send_command(command)
> -> 1304 return_value = get_return_value(
> 1305 answer, self.gateway_client, self.target_id, self.name)
> 1307 for temp_arg in temp_args:
> 1308 temp_arg._detach()
>
> File /usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py:111, in capture_sql_exception..deco(*a, **kw)
> 109 def deco(*a, **kw):
> 110 try:
> --> 111 return f(*a, **kw)
> 112 except py4j.protocol.Py4JJavaError as e:
> 113 converted = convert_exception(e.java_exception)
>
> File /usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
> 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
> 325 if answer[1] == REFERENCE_TYPE:
> --> 326 raise Py4JJavaError(
> 327 “An error occurred while calling {0}{1}{2}.\n”.
> 328 format(target_id, “.”, name), value)
> 329 else:
> 330 raise Py4JError(
> 331 “An error occurred while calling {0}{1}{2}. Trace:\n{3}\n”.
> 332 format(target_id, “.”, name, value))
>
> Py4JJavaError: An error occurred while calling o6595.fit.
> : ml.dmlc.xgboost4j.java.XGBoostError: XGBoostModel training failed
> at ml.dmlc.xgboost4j.scala.spark.XGBoost$.postTrackerReturnProcessing(XGBoost.scala:697)
> at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainDistributed(XGBoost.scala:573)
> at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:191)
> at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:40)
> at org.apache.spark.ml.Predictor.fit(Predictor.scala:151)
> at org.apache.spark.ml.Predictor.fit(Predictor.scala:115)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:282)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:238)
> at java.lang.Thread.run(Thread.java:748)