Hi everyone. I am working on the xgboost algorithm and I noticed in updater_histmaker that while proposing split points in ResetPosAndPropose, the UpdateSketchCol function was called on a row batch of a column (i.e. data is divided into batches and one particular column of a batch is passed to the UpdateSketchCol). But in xgboost 0.9 we arent getting the row batches and then calling UpdateSketchCol. Instead, we call GetSortedColumnBatches and pass that column completely to the UpdateSketchCol.
Earlier: a particular column was passed in batches to UpdateSketchCol
Now: complete column passed in one go to UpdateSketchCol
This difference is causing different feature values to be added to the sketchs array which leads to different number of points being proposed (as candidate split points).
I get it that the DMatrix structure was refactored completely. But why is data now being processed completely in one go now rather than in row batches?
I wanted to understand why this change happened in updater histmaker.