I am still confused about the difference, when I run the predict_proba
method with multiple rows (ndata say) in a multi-class classifier, I do get a ndata * nclass
matrix output as well.
From what I know, softmax
calculates probability distribution over a vector of values. So not sure what softprob
is doing differently.
Can someone clarify the difference? It’s probably very subtle but escapes me.