Shap values not adding up to margin values


#1

I have trained a xgboost model using the sci-kit learn implementation, pickled it, then unpickled and calculated the shap values (using .predict(… pred_contribs=True).

However, the sum of these shap values per individual does not add up to the margin. This seems to be happening to me sporadically. Therefore, I decided to refit my model using the hyperparameter tuned paramters. Here is my code, do you see any glaring errors?

Thanks in advance

   # load 
    input_df = pd.read_csv(input_file, sep="\t")
    best_model = pickle.load(open(model_file, 'rb'))
    
    
    # train and test 
    xtrain = input_df.loc[input_df['partition']=="grid_cv", input_df.columns[1:-2]].copy()
    xtest = input_df.loc[input_df['partition']=="held_out", input_df.columns[1:-2]].copy()
    ytrain = input_df.loc[input_df['partition']=="grid_cv", input_df.columns[-2]].copy()
    ytest = input_df.loc[input_df['partition']=="held_out", input_df.columns[-2]].copy()

    # get setting from hyperparamter tuned model 
    bst_params = best_model.get_params()
    best_xgb_rf = XGBClassifier(**bst_params)
    best_xgb_rf.fit(xtrain, ytrain)

    # get shap and comapre to output 
    boost_  = best_xgb_rf.get_booster()
    shap_matrix = boost_.predict(xgb.DMatrix(xtrain, label=ytrain), pred_contribs=True)
    margin = boost_.predict(xgb.DMatrix(xtrain, label=ytrain,  feature_names=input_df.columns[1:-2]), output_margin=True)
    
    assert np.all(np.round(margin) == np.round(np.sum(shap_matrix, 1))), "shap values don't add up to margins"

#2

Notable, if I do not pass in the parameters from the previous model then this error does not occur…


#3

See XGBoost learning-to-rank model to predictions core function?


#4

Hi @hcho3,

Thanks for that link. I think you are suggesting that the base score might be different?

However when you add up the SHAP values with the bias term, it is not equal to the marginal ouput often by ~1000 or higher. Therefore with or without the bias term, the SHAP values are not close to the output_margin.

I also do not get this error if I train on the first five rows but using the full datasets or half the dataset results in the error.

I am not sure where woudl be a good place to start debuggin?


#5

I’m not really familiar with SHAP. Is it expected that SHAP values add up to the margin?


#6

Also, try posting at https://github.com/slundberg/shap/issues