We are extracting the feature importances for the
ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel by using
model.nativeBooster.getScore(featureNames, "gain"), where
featureNames is a list of variable names of all the numerical features and one-hot encoded vectors (created by using
org.apache.spark.ml.feature.OneHotEncoderEstimator) used in the model training.
When the above process is followed, we are getting a single feature importance value for the one-hot encoded vector. Is there a way to obtain the feature importance values separately for each category in one hot encoded variable ? For example: If we have a variable called
Co_Applicant with 3 categories
Yes-Same Address, currently we are getting only one feature importance value for this variable. Is there a way to get feature importances separately for each of the 3 categories present in the variable ? This way of getting feature importances for each category is a default behaviour in the python API when we call the
model.feature_importances_. How to achieve the same in XGBoost4J-Spark ?