In the XGBoost paper, the regularization parameter omega(f) is based on the L2-norm of w, the score assigned to each leaf (in a regression setting). But traditionally, penalization is made on the norm of the coefficients of the model, not on the score itself. If the score is lowered artificially because of that regularization parameter, the model is likely to underfit. Am I wrong on that?
The idea here is to temper the magnitude of leaf outputs. We want to ensure that no tree casts too huge a vote, so that the final prediction is a combination of many trees, not just a single tree.
That’s right, I didn’t have in mind that many trees were voting.