What is the feature importance method for the Sklearn XGBoost?

vatodorov · September 30, 2018, 12:44pm

The core XGBoost offers three methods for representing features importance - weight, gain and cover, but the Sklearn API has only one - feature_importances_. The code below outputs the feature importance from the Sklearn API. What is the method for determining importances?

xgb.XGBClassifier(**xgb_params).fit(X, y_train).feature_importances_

antontarasenko · November 22, 2018, 11:27am

Attribute feature_importances_ is based on weight.

XGBRegressor.get_booster().get_score(importance_type='weight') returns the number of occurrences of the feature in splits: integers greater than 0 (features not participating in splits are omitted). The docs.

XGBRegressor.feature_importances_ is the same but divided by the total sum of occurrences — so it sums up to one.

apprassa · June 19, 2019, 12:46am

How could I find the total sum of occurences?

NanaRobot · September 22, 2020, 9:57pm

Hi, thanks for your insights.
I tried calculating the XGBRegressor.feature_importances_ using the results from XGBRegressor.get_booster().get_score(importance_type='weight') through dividing it by the sum of all occurrences of all features. But the values did not match the ones from XGBRegressor.feature_importances_. Any idea why this happens?

antontarasenko · September 22, 2020, 11:56pm

I didn’t follow the changes in XGBRegressor since my previous message. Maybe the related codebase has changed.

What are the numbers you’re getting?

EDIT:

I found the following change made two years ago: https://github.com/dmlc/xgboost/pull/3876

This pull changes the default feature importance criterion for feature_importances_ from weight to gain. If you want the original behavior, you need to set importance_type of your XGBRegressor/XGBClassifier to 'weight'.

antontarasenko · September 22, 2020, 11:58pm

You can sum up the number of occurrences across all features.