It’s my understanding that for an XGBoost classifier with objective=‘multi:softprob’, the output of model.predict(data, output_margin=True) should be the class probabilities for each row in data. Also, it’s my understanding that model.predict_proba returns the class probabilities.
This understanding is based on the code here:
However, when I attempt the following, the plot looks not at all 1:1.
import xgboost as xgb
model = xgb.XGBClassifier(
objective='multi:softprob',
)
model.fit(X_train, y_train)
plt.plot(
[x[0] for x in model.predict(X_all, output_margin=True)],
[y[0] for y in model.predict_proba(X_all)],
'.',
)
Not true. The margin scores from model.predict(data, output_margin=True) need to be transformed by the softmax function to get class probabilities. Note that the X axis in your graph ranges from -15 to 5, so the margin scores are not proper probabilities.