We are extracting the feature importances for the ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel
by using model.nativeBooster.getScore(featureNames, "gain")
, where featureNames
is a list of variable names of all the numerical features and one-hot encoded vectors (created by using org.apache.spark.ml.feature.OneHotEncoderEstimator
) used in the model training.
When the above process is followed, we are getting a single feature importance value for the one-hot encoded vector. Is there a way to obtain the feature importance values separately for each category in one hot encoded variable ? For example: If we have a variable called Co_Applicant
with 3 categories No
, Yes-Different Address
, Yes-Same Address
, currently we are getting only one feature importance value for this variable. Is there a way to get feature importances separately for each of the 3 categories present in the variable ? This way of getting feature importances for each category is a default behaviour in the python API when we call the model.feature_importances_
. How to achieve the same in XGBoost4J-Spark ?