Softmax vs softprob - difference?

shashi-netra · September 29, 2019, 8:46am

I am still confused about the difference, when I run the predict_proba method with multiple rows (ndata say) in a multi-class classifier, I do get a ndata * nclass matrix output as well.

From what I know, softmax calculates probability distribution over a vector of values. So not sure what softprob is doing differently.

Can someone clarify the difference? It’s probably very subtle but escapes me.

hcho3 · September 30, 2019, 2:10am

softprob will output a vector of probabilities, whereas softmax will output a class output.

shashi-netra · October 1, 2019, 9:32am

Hi, Thanks for responding.

What I am confused about is doesn’t predict_proba (with softmax) also output a vector of probabilities over the classes?

clf.predict_proba([[...]]) = [[0.2,...0.8], [0.1,...0.4],...]

Isn’t this also outputing a matrix of probabilities? How is that output different from the softprob output?

hcho3 · October 1, 2019, 3:52pm

The difference only applies if you are using xgboost.train(), which give you a Booster object.

shashi-netra · October 1, 2019, 9:02pm

My apologies, but I am even more confused now.

Are you saying there is no difference by using softprob and softmax if I don’t use xgboost.train() method?
And If I used xgboost.train() what is the difference in the output.

hcho3 · October 1, 2019, 9:30pm

The difference applies when you call Booster.predict() method on the Booster object.