Feature Importance Based on optimal number of trees?


#1

Let’s say we fit a model that includes early stopping with a validation set and we find the best_ntree_limit is 1,000 but I set early_stopping_rounds to be 500. Therefore our model object has 1,500 trees encoded.

We would like to get feature importances back from this model but only for the first 1,000 trees - the optimal model - and not the overfit model with 1,500 trees. Is that possible in either Python or R API without having to calculate ourselves?

Thanks.


#2

XGBoost only safe the last Model, not the best one (best ntree). So we have to run it again in exactly 1000 tree.