Multiclassification training process

[migrated from https://github.com/dmlc/xgboost/issues/2313]

Hi all, I would like to ask about a more detailed explanation about the training process for multi class target variables (such as http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf for binary).
if the target variable has 3 levels, I understand it trains 3 binary “one vs other” boosters in each iteration but I would like to understand both the interaction between these booster’s training both inside the same iteration and between iterations.
While in round X, once one of the 3 is trained, do these leaf scores get used for the other 2 (label training order would influence a lot) or no? Are the objective functions of those trees binary? (not mlogloss but logloss for this binary class only)? When next round begins, each booster receives scores from every single tree trained before or just the ones corresponding to its own class? (related to the obj function being binary or not)

I couldn’t find any detailed documentation around so if any is available please share.

Thanks a lot in advance.

Best.

The multiclass training tries to optimize a single objective function, that is the cross entropy loss of the softmax prediction. And the leaf score strictly follows the gradient boosting rule of first and second order gradient wrt to the loss function

I still don’t understand. What is the relationship of trees inside the same iteration and between iterations? (Here, one iteration means generate k trees for k classification problem based on one-vs-rest strategy). When next round begins, each booster receives scores from every single tree trained before or just the ones corresponding to its own class?

I deeply hope you can explain it in detail~~~

Thanks a lot in advance.

Best.