ValueError: Must pass 2-d input when setting predict_contribs to True

Dola · April 27, 2020, 1:36pm

Dear All,

I am trying to use predict_contribs=True within my XGBOOST Model. However, whenever I set it to True I do not get the probabilities out of the predict function and I get the following error.

Dola · April 28, 2020, 11:15pm

@hcho3 Hi Philip, Any idea how to overcome this error? and why it occurs only when I set the predict_contrib to True!!

hcho3 · April 29, 2020, 1:28am

I don’t think this is issue with XGBoost at all. You are passing result of the prediction function Booster.predict() to pandas.DataFrame() constructor. This is an error because the prediction result is 1D array but DataFrame expects a 2D array.

You should store the result of the prediction function to an intermediate variable and use reshape() to re-shape it to 2D.

Dola · April 29, 2020, 5:43pm

@hcho3 Well, what I can say right now that if the objective is multi:softprob as in my case the output of the predict function is a 2D array. But, when we set the predict_contrib to True the output is 1D not 2D(I think so).

I got the results when I set the predict_contrib to True but I do not really understand the values. I think it is not a probability because I have a lot of negative values.

Can you give me a brief illustration of what exactly I get out when I set the predict_contrib to True?

Moreover, their sizes is extremely big. I have 19490872 rows. While, the data sent to the predict function were 40845 rows × 322 columns

when predict_contrib is set to True here’s the output:

It worth noting, that I listed the output of the model.predict to overcome the problem of the 2D issue. But, I want to know if I should have a 2D output from the model.predict, when the predict_contrib is set to True or not?

hcho3 · April 29, 2020, 8:21pm

The SHAP prediction should be of dimension (nsample, nfeats + 1). I don’t know if SHAP works with multi-class classifier. In multi-class classifier, you get multiple outputs for each data row, one per class.

Dola · April 29, 2020, 9:00pm

I got some results. I do not know if you can help me with that or not.

I have a 2 class classification problem. The data submitted to the predict function were of shape (28166,345).
The result I got out of the predict function were of shape (28166,2,346). I do not know if this correct because I have 2 class classification problem or that is actually an error and it should come out with a shape of (28166,346).

I understood now that the ‭19,490,872‬ rows I was getting before, were because I was applying flatten so It was the result of multiplying (28166 by 2 by 346).

hcho3 · April 29, 2020, 9:11pm

Ah so for the multi-class setting the SHAP output is (nsample, num_class, nfeats+1).

Dola · April 29, 2020, 9:14pm

That’s what I can interpret as well. However, I am not sure I was aiming to be assured with that from your side

hcho3 · April 29, 2020, 9:17pm

Well I learned something today Even though I am a maintainer, I don’t know all parts of the codebase equally.

Dola · April 29, 2020, 9:21pm

But, do you think that the results are correct?

hcho3 · April 29, 2020, 9:22pm

I think so, based on this line of code:

github.com

dmlc/xgboost/blob/b9649e7b8eb73a9679816e8f3986f335cdf850a2/python-package/xgboost/core.py#L1603


                                      data.num_col() + 1)
            else:
                preds = preds.reshape(nrow, ngroup,
                                      data.num_col() + 1,
                                      data.num_col() + 1)
        elif pred_contribs:
            ngroup = int(chunk_size / (data.num_col() + 1))
            if ngroup == 1:
                preds = preds.reshape(nrow, data.num_col() + 1)
            else:
                preds = preds.reshape(nrow, ngroup, data.num_col() + 1)
        else:
            preds = preds.reshape(nrow, chunk_size)
    return preds


def inplace_predict(self, data, iteration_range=(0, 0),
                    predict_type='value', missing=np.nan):
    '''Run prediction in-place, Unlike ``predict`` method, inplace prediction does
    not cache the prediction result.


    Calling only ``inplace_predict`` in multiple threads is safe and lock

Dola · April 29, 2020, 9:28pm

Yup, it’s clear now thanks a lot Philip .

I have one more question, but I do not really now if you can help me with that or not.

I see now that we have to split the shap_values for each class, and as well the expected_value of shap will be calculated for every class.

For example:
shap_output = model.predict(X, predict_contribs=True)
shap_values = shap_output[:, :-1] will be of shape (28166,1,346)
expected_value= shap_output[0, -1] will be of shape (346,)

I wonder then how can I submit these values to the plot functions of shap, such that each feature present it’s contribution for both classes.

hcho3 · April 29, 2020, 9:36pm

I think shap_output[:, 0, :] captures the features’ contributions to prediction for the first class, and shap_output[:, 1, :] captures the features’ contributions to prediction for the second class.

As for SHAP plotting, I suggest that you raise questions at https://github.com/slundberg/shap/issues.

Dola · April 30, 2020, 12:54am

Yup, totally agree but then my question is still raised how to combine the contributions for each class.
So, I think I will go for a question on SHAP Github. I will notify you if I figured out how to solve this issue.

Thanks a lot Philip .