I’m trying to import a previously trained & saved model in order to save time. For some reason, the output of predict_proba gives a different output when I initially train is on data vs when I import it from a JSON file. This only seems to be happening for multiclass target variables. Any ideas on how I can get these two to match without modifying the actual predict_proba output?
import os
import pandas as pd
from xgboost import XGBClassifier,Booster
header = ['x1','x2','y']
data = [
[1,2,1],
[2,2,2],
[2,1,0],
[0,1,0],
[2,2,2],
[1,2,1],
]
train_df = pd.DataFrame(data,columns=header)
test_df = pd.DataFrame(data=[[0,1]],columns=['x1','x2'])
model = XGBClassifier().fit(df.loc[:,df.columns!='y'],df.y)
print('Initial model predict_proba output:',model.predict_proba(test_df))
# output: Initial model predict_proba output: [[0.33333328 0.46852115 0.19814554]]
model_fpath = os.path.join(os.getcwd(),'test_model.json')
model.save_model(model_fpath)
imported_model = XGBClassifier()
booster = Booster()
booster.load_model(model_fpath)
imported_model._Booster = booster
print('Imported model predict_proba output:',imported_model.predict_proba(test_df))
# output: Imported model predict_proba output: [[0.66666675 0.33333328] [0.5314789 0.46852115] [0.8018545 0.19814554]]