Loaded Model Predict Proba Output Different From Initially Trained Model

tsj6qq · January 13, 2023, 4:20pm

I’m trying to import a previously trained & saved model in order to save time. For some reason, the output of predict_proba gives a different output when I initially train is on data vs when I import it from a JSON file. This only seems to be happening for multiclass target variables. Any ideas on how I can get these two to match without modifying the actual predict_proba output?

import os
import pandas as pd
from xgboost import XGBClassifier,Booster

header = ['x1','x2','y']
data = [
    [1,2,1],
    [2,2,2],
    [2,1,0],
    [0,1,0],
    [2,2,2],
    [1,2,1],
]
train_df = pd.DataFrame(data,columns=header)
test_df = pd.DataFrame(data=[[0,1]],columns=['x1','x2'])

model = XGBClassifier().fit(df.loc[:,df.columns!='y'],df.y)
print('Initial model predict_proba output:',model.predict_proba(test_df))
# output: Initial model predict_proba output: [[0.33333328 0.46852115 0.19814554]]

model_fpath = os.path.join(os.getcwd(),'test_model.json')
model.save_model(model_fpath)

imported_model = XGBClassifier()
booster = Booster()
booster.load_model(model_fpath)
imported_model._Booster = booster

print('Imported model predict_proba output:',imported_model.predict_proba(test_df))
# output: Imported model predict_proba output: [[0.66666675 0.33333328] [0.5314789  0.46852115] [0.8018545  0.19814554]]

jiamingy · January 20, 2023, 2:34am

You should use sklearn model to load sklearn model:

imported_model = XGBClassifier()
imported_model.load_model(model_fpath)

The sklearn estimator interface does some data manipulations to conform the sklearn estimator standard.

vfill · April 10, 2024, 1:44pm

Thank you - this is very useful answer. Let me extend it:
I was facing similar issue (output different from initially trained model) with XGBRegressor, which was trained with categorical values:

reg = xgb.XGBRegressor(enable_categorical=True, ...)
reg.fit(...)

Re-loading the model using your code and scoring the sample allows using the categorical variables without calling for their explicit support. Not only is this approach simpler; furthermore it returns the same predictions as the original model:

imported_model = xgb.XGBRegressor()
imported_model.load_model(model_fpath)
# works even with categ. variables and returns identical predictions
replicated_preds = imported_model.predict(X_test) 
# works with categ. variables indeed, but returns different predictions
# imported_model.predict(xgb.DMatrix(X_test, enable_categorical=True))