Discrepancy between predict_proba and predict (using output_margin=True) for multi:softprob

It’s my understanding that for an XGBoost classifier with objective=‘multi:softprob’, the output of model.predict(data, output_margin=True) should be the class probabilities for each row in data. Also, it’s my understanding that model.predict_proba returns the class probabilities.

This understanding is based on the code here:


However, when I attempt the following, the plot looks not at all 1:1.

import xgboost as xgb
model = xgb.XGBClassifier(
    objective='multi:softprob',
)
model.fit(X_train, y_train)
plt.plot(
    [x[0] for x in model.predict(X_all, output_margin=True)],
    [y[0] for y in model.predict_proba(X_all)],
    '.',
)

discrepancy

What causes this discrepancy? Thanks!

Not true. The margin scores from model.predict(data, output_margin=True) need to be transformed by the softmax function to get class probabilities. Note that the X axis in your graph ranges from -15 to 5, so the margin scores are not proper probabilities.