XGBoost, XGBoost hist and LightGBM benchmark: XGBoost hist performance drop

Hi,

I’m doing a benchmark between XGBoost, XGBoost with leaf-wise histogram and LightGBM. These are the results I’m getting:

Airline subsample size Lib Training time (s) Test time (s) AUC F1
10,000 xgb 2.7050 0.0146 0.8346 0.8010
10,000 xgb_hist 0.3963 0.0075 0.7009 0.7478
10,000 lgb 0.2183 0.0161 0.8264 0.7971
100,000 xgb 5.6744 0.0432 0.9015 0.8357
100,000 xgb_hist 0.4495 0.0265 0.7059 0.7351
100,000 lgb 0.8666 0.1032 0.9090 0.8564
1,000,000 xgb 106.1552 0.2917 0.9107 0.8401
1,000,000 xgb_hist 2.9730 0.0463 0.7061 0.7545
1,000,000 lgb 9.7474 1.1421 0.9262 0.8750
10,000,000 xgb 2579.3413 3.1140 0.8630 0.7904
10,000,000 xgb_hist 25.7753 0.4432 0.6845 0.7124
10,000,000 lgb 100.1800 13.6676 0.8867 0.8199

You can see the code here.

Both XGBoost and LightGBM have competitive results, however, XGBoost hist has a considerable performance drop. For example, in the smaller dataset, the AUC in XGB is 0.83, LGB is 0.82, however XGB hist is 0.7.

These are the parameters I’m using for each:

num_rounds = 200
xgb_clf_pipeline = xgb.XGBRegressor(max_depth=8,
                                    n_estimators=num_rounds,
                                    min_child_weight=30,
                                    learning_rate=0.1,
                                    scale_pos_weight=2,
                                    gamma=0.1,
                                    reg_lambda=1,
                                    subsample=1,
                                    n_jobs=-1,
                                    random_state=77)
xgb_hist_clf_pipeline = xgb.XGBRegressor(max_depth=0,
                                        max_leaves=255,
                                        n_estimators=num_rounds,
                                        min_child_weight=30,
                                        learning_rate=0.1,
                                        scale_pos_weight=2,
                                        gamma=0.1,
                                        reg_lambda=1,
                                        subsample=1,
                                        grow_policy='lossguide',
                                        tree_method='hist',
                                        n_jobs=-1,
                                        random_state=77)
lgbm_clf_pipeline = lgb.LGBMRegressor(num_leaves=255,
                                      n_estimators=num_rounds,
                                      min_child_weight=30,
                                      learning_rate=0.1,
                                      scale_pos_weight=2,
                                      min_split_gain=0.1,
                                      reg_lambda=1,
                                      subsample=1,
                                      n_jobs=-1,
                                      seed=77)

I did this experiment 5 years ago with the same parameters as now, and the 3 algorithms had similar performance. The performance drop could be because there has been changes in the parameters or because there is a bug.

Any idea why this could be happening?

Which XGBoost version were you using?

Fixed in https://github.com/dmlc/xgboost/pull/7551 .

@hcho3 @jiamingy I was using 1.5.2, after the fix, I can confirm that with 1.6.0rc1 the hist version works:

{
    "lgbm": {
        "performance": {
            "AUC": 0.9107592746802494,
            "Accuracy": 0.83225,
            "F1": 0.8560147633148792,
            "Precision": 0.8479000170039109,
            "Recall": 0.8642863333044458
        },
        "test_time": 0.0745178820000092,
        "train_time": 0.9469629170000644
    },
    "xgb": {
        "performance": {
            "AUC": 0.9033527393352302,
            "Accuracy": 0.7873,
            "F1": 0.8353460287970275,
            "Precision": 0.7547737287542842,
            "Recall": 0.9351763584366063
        },
        "test_time": 0.020666641000048003,
        "train_time": 7.129535925999903
    },
    "xgb_hist": {
        "performance": {
            "AUC": 0.9079546106230758,
            "Accuracy": 0.79705,
            "F1": 0.8405922318658446,
            "Precision": 0.7686009767308245,
            "Recall": 0.9274633850420314
        },
        "test_time": 0.027357233000088854,
        "train_time": 1.9496805300000233
    }
}

Thanks

2 Likes