I have trained a xgboost model using the sci-kit learn implementation, pickled it, then unpickled and calculated the shap values (using .predict(… pred_contribs=True).
However, the sum of these shap values per individual does not add up to the margin. This seems to be happening to me sporadically. Therefore, I decided to refit my model using the hyperparameter tuned paramters. Here is my code, do you see any glaring errors?
Thanks in advance
# load
input_df = pd.read_csv(input_file, sep="\t")
best_model = pickle.load(open(model_file, 'rb'))
# train and test
xtrain = input_df.loc[input_df['partition']=="grid_cv", input_df.columns[1:-2]].copy()
xtest = input_df.loc[input_df['partition']=="held_out", input_df.columns[1:-2]].copy()
ytrain = input_df.loc[input_df['partition']=="grid_cv", input_df.columns[-2]].copy()
ytest = input_df.loc[input_df['partition']=="held_out", input_df.columns[-2]].copy()
# get setting from hyperparamter tuned model
bst_params = best_model.get_params()
best_xgb_rf = XGBClassifier(**bst_params)
best_xgb_rf.fit(xtrain, ytrain)
# get shap and comapre to output
boost_ = best_xgb_rf.get_booster()
shap_matrix = boost_.predict(xgb.DMatrix(xtrain, label=ytrain), pred_contribs=True)
margin = boost_.predict(xgb.DMatrix(xtrain, label=ytrain, feature_names=input_df.columns[1:-2]), output_margin=True)
assert np.all(np.round(margin) == np.round(np.sum(shap_matrix, 1))), "shap values don't add up to margins"