How fix initial trees without features into get_booster().get_dump()?


#1

The xgb.get_booster().get_dump() shows that some of individual trees are empty (no variable printed - only constants).

['0:leaf=-0.0190534852\n',
 '0:leaf=-0.0190102123\n',
 '0:leaf=-0.0187628437\n',
 '0:leaf=-0.0185498875\n',
 '0:[sump_any_any_any_any_any<196911] 
yes=1,no=2,missing=1\n\t1:leaf=-0.0185494889\n\t2:leaf=-0.0176288597\n',
 '0:[sumd_per_any_any_any_trg<2115.5] 
yes=1,no=2,missing=1\n\t1:leaf=-0.0183512494\n\t2:leaf=-0.0171739999\n',
 '0:leaf=-0.0180586521\n',
 '0:[sumd_per_any_any_any_trg<2908] 
yes=1,no=2,missing=1\n\t1:leaf=-0.0179917756\n\t2:leaf=-0.0167234316\n',

#2

The mean of your response variable appears to be a lot lower than 0.5 (which is the default base_score in XGBoost), so the algorithm has determined that the best thing to do for the first four trees is not to make any splits but just take the mean of all the observations as the single leaf weight. This is not a problem with XGBoost. Try to run again but this time set base_score equal to the (weighted) mean of your training response variable.


#3

Thank you a lot!
Now I’ve realized all.