What is the feature importance method for the Sklearn XGBoost?

The core XGBoost offers three methods for representing features importance - weight, gain and cover, but the Sklearn API has only one - feature_importances_. The code below outputs the feature importance from the Sklearn API. What is the method for determining importances?

xgb.XGBClassifier(**xgb_params).fit(X, y_train).feature_importances_
1 Like

Attribute feature_importances_ is based on weight.

XGBRegressor.get_booster().get_score(importance_type='weight') returns the number of occurrences of the feature in splits: integers greater than 0 (features not participating in splits are omitted). The docs.

XGBRegressor.feature_importances_ is the same but divided by the total sum of occurrences — so it sums up to one.

1 Like

How could I find the total sum of occurences?

Hi, thanks for your insights.
I tried calculating the XGBRegressor.feature_importances_ using the results from XGBRegressor.get_booster().get_score(importance_type='weight') through dividing it by the sum of all occurrences of all features. But the values did not match the ones from XGBRegressor.feature_importances_. Any idea why this happens?

I didn’t follow the changes in XGBRegressor since my previous message. Maybe the related codebase has changed.

What are the numbers you’re getting?

EDIT:

I found the following change made two years ago: https://github.com/dmlc/xgboost/pull/3876

This pull changes the default feature importance criterion for feature_importances_ from weight to gain. If you want the original behavior, you need to set importance_type of your XGBRegressor/XGBClassifier to 'weight'.

You can sum up the number of occurrences across all features.