NaN Values and Scikit-Learn RFECV


I’ve been trying to use sklearn.feature_selection.RFECV to perform feature selection on an XGB classifier model. I’ve successfully been able to fit a model using sklearn.model_selection.cross_val_score, however when trying to use RFECV, I get an error that ‘Input contains NaN…’

I believe this is because RFECV does some checking based on the tags that it gets from the estimator. It uses the tag ‘allow_nan’ to determine whether or not to check X for NaN values. It seems that currently XGBoost simply inherits the default “allow_nan” tag value from the scikit-learn estimator class, which is False. As XGB does in fact handle null values in X, I believe this behavior is incorrect.

Thoughts on how to get around this? Should this be raised as a bug on GitHub?


Yes, please escalate to GitHub repo.