Multiclassification training process

[migrated from]

Hi all, I would like to ask about a more detailed explanation about the training process for multi class target variables (such as for binary).
if the target variable has 3 levels, I understand it trains 3 binary “one vs other” boosters in each iteration but I would like to understand both the interaction between these booster’s training both inside the same iteration and between iterations.
While in round X, once one of the 3 is trained, do these leaf scores get used for the other 2 (label training order would influence a lot) or no? Are the objective functions of those trees binary? (not mlogloss but logloss for this binary class only)? When next round begins, each booster receives scores from every single tree trained before or just the ones corresponding to its own class? (related to the obj function being binary or not)

I couldn’t find any detailed documentation around so if any is available please share.

Thanks a lot in advance.


The multiclass training tries to optimize a single objective function, that is the cross entropy loss of the softmax prediction. And the leaf score strictly follows the gradient boosting rule of first and second order gradient wrt to the loss function