Unexpected Short Training Time When Removing Feature Dimensions

Zongshun96 · September 18, 2023, 10:13pm

When removing (unimportant) features to certain level, I got a suprisingly short training time, which worried us if the implementation is faulty. Could you please comment how to debug or if that is expected behavior?

The plot below shows the sudden change in training time when just removing some features. It seems the pattern is stable.

Traintime_by_Inputsize_for_Fixed_10_Labels

Training Parameters

BOW_XGB_init = xgb.XGBClassifier(n_estimators=100, max_depth=1, learning_rate=0.1,silent=False, objective='binary:logistic', \
                  booster='gbtree', n_jobs=32, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, \
                  subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, reg_lambda=1)

System:
128 physical cores; 256 GB memory; Ubuntu 22.04.2; python3.10.12

Thanks,
Zongshun

hcho3 · September 19, 2023, 11:06pm

Have the model size change considerably? Try counting the average number of nodes and see if that number is different.

Zongshun96 · September 20, 2023, 4:27am

Thank you for the input. Each of my models is 470KB with 1000 trees and depth 1.

I also wonder why my model got 1000 trees given I set n_estimators=100. Is it because I have 10 labels for the model, so it try to have 100 trees for each label?

Best,
Zongshun

hcho3 · September 21, 2023, 4:56pm

You can change this behavior by setting multi_strategy="multi_output_tree". See https://xgboost.readthedocs.io/en/latest/tutorials/multioutput.html for more details.