How fix initial trees without features into get_booster().get_dump()?

roskovlad · June 14, 2019, 8:45am

The xgb.get_booster().get_dump() shows that some of individual trees are empty (no variable printed - only constants).

['0:leaf=-0.0190534852\n',
 '0:leaf=-0.0190102123\n',
 '0:leaf=-0.0187628437\n',
 '0:leaf=-0.0185498875\n',
 '0:[sump_any_any_any_any_any<196911] 
yes=1,no=2,missing=1\n\t1:leaf=-0.0185494889\n\t2:leaf=-0.0176288597\n',
 '0:[sumd_per_any_any_any_trg<2115.5] 
yes=1,no=2,missing=1\n\t1:leaf=-0.0183512494\n\t2:leaf=-0.0171739999\n',
 '0:leaf=-0.0180586521\n',
 '0:[sumd_per_any_any_any_trg<2908] 
yes=1,no=2,missing=1\n\t1:leaf=-0.0179917756\n\t2:leaf=-0.0167234316\n',

pandas_pandas · June 16, 2019, 4:03pm

The mean of your response variable appears to be a lot lower than 0.5 (which is the default base_score in XGBoost), so the algorithm has determined that the best thing to do for the first four trees is not to make any splits but just take the mean of all the observations as the single leaf weight. This is not a problem with XGBoost. Try to run again but this time set base_score equal to the (weighted) mean of your training response variable.

roskovlad · June 17, 2019, 8:27am

Thank you a lot!
Now I’ve realized all.