How to get feature importance for each itration during building tree?

I am planning to add some functionality to xgboost libs, but I needed to know feature importance for each itration during building tree, not just final feature importance. How can I get it? I am ok with modifying c++code. Can someone point me in code base where this happening ?

Thanks

You can use the callback interface to compute the feature importance for each iteration.

import xgboost as xgb

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

def MyCallback():
    def callback(env):
        print(env.model.get_score(importance_type='weight'))
    return callback

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=100)

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

params = {'objective':'reg:squarederror', 'eval_metric': 'rmse'}

bst = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtrain, 'train'), (dtest, 'test')],
        callbacks=[MyCallback()])
1 Like

Thanks for response. It seems like importance_type='weight is giving frequency of feature, rather than normal feature importance.

tree

['0:[f12<9.72500038] yes=1,no=2,missing=1,gain=13821.3594,cover=404\n'
 '\t1:[f5<6.94099998] yes=3,no=4,missing=3,gain=4945.9375,cover=173\n'
 '\t\t3:[f7<1.48494995] yes=7,no=8,missing=7,gain=54.140625,cover=117\n'
 '\t\t\t7:leaf=11.1375008,cover=3\n'
 '\t\t\t8:[f5<6.54300022] yes=15,no=16,missing=15,gain=13.8671875,cover=114\n'
 '\t\t\t\t15:leaf=6.63821936,cover=72\n'
 '\t\t\t\t16:leaf=8.03651237,cover=42\n'
 '\t\t4:[f5<7.43700027] yes=9,no=10,missing=9,gain=20.40625,cover=56\n'
 '\t\t\t9:leaf=9.67676544,cover=33\n'
 '\t\t\t10:[f0<2.65402508] yes=17,no=18,missing=17,gain=218.140625,cover=23\n'
 '\t\t\t\t17:leaf=12.9000006,cover=22\n'
 '\t\t\t\t18:leaf=3.21000004,cover=1\n'
 '\t2:[f12<16.0849991] yes=5,no=6,missing=5,gain=2029.88281,cover=231\n'
 '\t\t5:[f11<47.7250023] yes=11,no=12,missing=11,gain=58.078125,cover=117\n'
 '\t\t\t11:leaf=1.45500004,cover=1\n'
 '\t\t\t12:leaf=5.9989748,cover=116\n'
 '\t\t6:[f0<10.4524002] yes=13,no=14,missing=13,gain=497.976562,cover=114\n'
 '\t\t\t13:leaf=4.49571419,cover=83\n'
 '\t\t\t14:[f4<0.675000012] yes=19,no=20,missing=19,gain=48.0822754,cover=31\n'
 '\t\t\t\t19:leaf=3.70799994,cover=9\n'
 '\t\t\t\t20:[f7<1.42575002] yes=21,no=22,missing=21,gain=2.17785645,cover=22\n'
 '\t\t\t\t\t21:leaf=0.675000012,cover=1\n'
 '\t\t\t\t\t22:leaf=2.42590928,cover=21\n']
{'f12': 2, 'f5': 3, 'f7': 2, 'f0': 2, 'f11': 1, 'f4': 1}

You can choose other value for importance_type: https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster.get_score

Thanks for quick response.