[jvm-packages] v0.90 sparse vector prediction issue on missing values


#1

Hi,

I use sparse vector for XGBoost4J-Spark. fit() threw “java.lang.RuntimeException: you can only specify missing value as 0.0 (the currently set value NaN) when you have SparseVector or Empty vector as your feature format”. By adding ‘missing’ 0.0 to parameters, fit() is now happy.

transform() still threw “ERROR ml.dmlc.xgboost4j.java.DataBatch - java.lang.RuntimeException: you can only specify missing value as 0.0 (the currently set value NaN) when you have SparseVector or Empty vector as your feature format”. model.getMissing() shows nan. Looking at the code, both XGBoostClassifier and XGBoostRegressor have setMissing(). However, XGBoostRegressionModel/XGBoostClassificationModel does not.

Did I miss anything? Is there a way to set missing in model to make transform() happy? Thank you!


#2

Take a look at the tutorial on handling missing values: https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#dealing-with-missing-values


#3

@hcho3 Thanks! The tutorial helped solve the problem on the model training part by adding “missing 0.0” to parameters. I am still seeing the problem on the prediction part.


#4

That’s weird, can you explicitly call setMissing() on XGBoostRegressor before calling fit()?


#5

I am using XGBoostClassifier. xgboost_classifier.setMissing(0.0) helped fit(). Similarly, model.transform() failed with “java.lang.RuntimeException: you can only specify missing value as 0.0 (the currently set value NaN) when you have SparseVector or Empty vector as your feature format”. Looks like missing is set correctly in XGBoostClassifier, but it is not passed to XGBoostClassificationModel.


#6

This looks like a bug. I filed a report at https://github.com/dmlc/xgboost/issues/4530


#7

you can use “xgbClassificationModel.set(xgbClassificationModel.missing, 0)”


#8

Cool, we should add it to the tutorial


#9

I actually should expose these methods to both classifier/regressor and models…


#10

Thanks, that would be great!