I was wondering there was anyone here that has a good understanding of how SHAP is applied to XGBoost that could help me?
I am have created an XGBoost model to predict sales based on a number of variables (different marketing spends etc) and now want to be able to have an explainer that gives the absolute contribution of each of the variables to sales, is this something that the SHAP values will give?
I understand the basic principles of SHAP values (comparing model output using all of the permutations of variables to try and derive the impact of each individual variable). What I don’t understand is:
- Given that the SHAP package uses an estimation of the actual Shapely values (for computational feasibility reasons), it does not retrain the for each of the permutation of variables. Then how do we ‘take variables out’ of the model to try and determine its impact.
My current understanding if that you run your XGBoost model, from which you get your final output as a combination of many trees. Then for the permutations of each of the variable you include/exclude the trees that make up your prediction depending on whether they include certain variables, hence recreating the Shapely methodology without doing multiple model runs.
That may be totally wrong so more than happy to be proven otherwise.
- When you finally get your SHAP values, what do they actually mean? I originally assumed that they were the absolute contribution that having that variable made to total sales, such that a zero value in the variable leads to a contribution of zero. I am now not so sure. Is someone able to clarify this?
Apologies if either of my queries are unclear, I can clarify if needed.
Thanks in advance,