Retrieve feature_names from pickled model

I have a large number of models trained with previous versions of xgboost (mainly 1.2.x) that are saved as pickled objects. When I load them with 1.4.2, the model_features list is completely empty. Reverting to 1.2 brings that list back, so I know it’s still available in the pickled model.

I understand JSON is the standard going forward, so I’d like to know, is there a way to load the pickled model without losing feature_names and then re-save it as JSON.

I’ve tried all of the following, to no avail

  • Load with xgboost 1.3 (feature_names is populated) and save using bst.save_model() to binary format (feature_names are lost when I re-load using bst.load_model())
  • Load pickled model and save to JSON using 1.4 (feature_names are lost when I load the pickled model)
  • Load pickled model (feature_names is populated) and save to JSON using 1.2 and 1.3 (feature_names are lost when I re-load using bst.load_model())

Is there any easy way to achieve what I’m trying, other than manually keeping track of the feature_names for each model and repopulating it before saving it in JSON format with the latest version of xgboost?

I understand your frustration of having to migrate a large number of models. Unfortunately, there isn’t an easy way to migrate the feature name information.

other than manually keeping track of the feature_names for each model and repopulating it before saving it in JSON format with the latest version of xgboost?

This is essentially what you’ll have to do. For this, you’d need to have two Python virtual environments, to use XGBoost 1.3 and 1.4 respectively.

In general, backward compatibility is difficult when it comes to Python pickles. That is, it is hard to guarantee that a pickle produced with a previous version of XGBoost can be read into a new version of XGBoost.

Thanks for the quick response. I fully understand pickles are unreliable, which is why I tried to save them as the default xgboost binary format as well, which also failed.

Lets assume I started with XGBoost 1.2, how should I save it to preserve all attributes including feature_names and best_ntree_limit, for use with future versions of XGBoost? The reason I ask is, I have access to XGBoost 1.2, so I can easily load up the pickled models in that version, save them in an approved format, and then use them in the latest version of XGBoost.

EDIT. best_ntree_limit is already saved as part of the model file.

You can’t. More precisely, neither binary format nor JSON format will save feature_names.

We are trying move away from storing important information in Python attributes. The only guarantee we provide is that, if a piece of information is already available in a saved JSON file, it will be preserved when the JSON file is read in a future version of XGBoost. So we are in a difficult situation, since XGBoost 1.2 does not yet feature_names in the JSON file.

You have two alternatives:

  • Manually export important attributes (like feature_names) as a separate JSON file, and then re-populate them after migrating the model to the latest version.
  • Keep a separate environment with old version of XGBoost.

Awesome, thanks for the advice. I think I’ve already resigned myself to needing to keep a log of the feature_names somewhere, I was just hoping there’s an easier way.

In that case, where/how should we store important information?

XGBoost developers (including me) are moving important information like feature_names into the saved JSON file. This way, it can be accessed portably in later versions of XGBoost.

1 Like