I would like to have better knowledge of what XGBoost really does. I understand the basic Gradient boosting tree algorithm, but I assume that as state of the art model XGBoost have to do something extra like regularization, cutting gradients, maybe some interesting modification to weak learners and sampling methods. I’ve tried to find some description of those improvements, maybe some high level pseudocode or diagram mentioning those, but I only found this article https://arxiv.org/pdf/1603.02754.pdf which is kinda old and also mentioning some subsampling method only to tell that it’s not implemented. I know that XGBoost is open source, so basically I can go through the code, but it would take quite some time. I’m also more interested in things influencing the results than some performance or scalability improvements, though I know that sometimes there is a trade-off, and you need to sacrifice little accuracy to get better performance.
What optimizations and modification XGBoost does?