Avoid calculating unused features

johan · January 26, 2023, 3:38pm

Hi!
I train a XGBoost model in python with about 2000 features calculated by TSFresh. Checking feature_importances_ I see that about 400 are non-zero so I assume those are the only features used by the model. When I deploy the model I would like to only calculate the features actually used by the model to gain speed, but if i don’t provide all the features it was trained on it complains. I therefore create a DataFrame with all zeros except for those 400 i assume are used by the model, which I calculate, however this does not produce the same prediction as providing all features.
What am I missing? Why is the result not the same? Can this be done in a better way?

Thanks!