Hi Team,
I have a few binary_features which as very important for my learning task, along with >20 continuous features. When i look at the feature importance - binary features are not making it to top-k. On debugging I found that xgboost is splitting on continuous features more than binary ones.
ref: https://www.youtube.com/watch?v=NLrhmn-EZ88&t=633s,
In this video - the presenter claims that the treeExtra repo (from Amazon) does this by normalising by entropy while computing the split
code: https://github.com/dariasor/TreeExtra/commit/2be1601657d01ebe4017f43ac957d84dbf901f20
I wish to know if this is already a part of xgboost? if not, where should I make these changes in xgboost repo to help the binary_features rank higher. Thank you