[replicate code notebook attached] XGBoost custom_metric vis disable_default_eval_metric not working

jp2338 · December 1, 2023, 10:28pm

I am referring to the xgboost issue https://github.com/dmlc/xgboost/issues/9782 and https://github.com/dmlc/xgboost/issues/3598

My current env: xgboost 2.0.2

From both issues, what I understand are ( correct me if my understanding is wrong):
a. current xgboost version does NOT support composite metrics, that is, my custom_metric function must return value like below:

def custom_metric_customized(predt: np.ndarray, dtrain:xgb.DMatrix):
     return "score_name", score # <---- works

but no way to return multiple metrics like

def custom_metric_customized(predt: np.ndarray, dtrain:xgb.DMatrix):
     return  [("score_name_1", score_1), ("score_name_2", score_2)] # <--- NOT work

b. if user want to track ( not optimized on ) default xgboost metric, we can use disable_default_eval_metric is True to track default xgboost metric.

c. disable default metric when either eval_metric or feval is specified.

However, I experiment some combinations, but the result is conflicted to what I understand.

Here is what I did :
I made up a customized metric function called eval_metric_accuracy_customized, this function is to define metric value where I only care the predicted value accuracy on prediction whose corresponding y_true is located in the [15%, 85%] range of y_true (that is, I care the prediction accuracy located in [15%, 85%], if the y_true is outlier, I less care its prediction error) . Obviously, I need to make this metric as large as better, so I should maximize its evaluation (https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.train )

The conflicted observation are :
a. for train API, the parameter maximize does not work. From my below example, regardless I set maximize to True/False, the printed out eval_metric_accuracy_xgboost for each num_boost_round does not change.

b. regardless I set disable_default_eval_metric True or not, the only difference is if I set it to False, the printed out evaluation metric will not include rmse. It does not impact the training procedure and the final prediction score is still the same.

Combined disable_default_eval_metricand the customized metriceval_metric_accuracy_xgboost( and setmaximize` to True/False), all four experiments shared the same prediction score …
Can you help me to identify where might be the error ? Is it because my customized metric function is not properly defined ?

Below is the code example. And if you can access Google Colab, here is the link for colab to replicate the result : https://colab.research.google.com/drive/1KxzOT25AVUgcDW6GubvmjFsytfhJQeh0?usp=sharing

# Load the data
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    pd.DataFrame(housing.data, columns=housing.feature_names),
    housing.target,
    test_size=0.25,
    random_state=131,
)

import xgboost as xgb
from typing import List, Tuple

# data
dtrain = xgb.DMatrix(data=X_train, label=y_train, missing=-999999999)
dvalid = xgb.DMatrix(data=X_test, label=y_test, missing=-999999999)

# made up xgboost params 
param = {'objective': 'reg:squarederror', 
           'disable_default_eval_metric': True, # <---------Noticed disable_default_eval_metric is True 
          'tree_method': 'hist',
          'booster': 'dart', 
          'lambda': 0.5022935723779454, 'alpha': 0.0010591193559734626, 
          'subsample': 0.7443155004860621, 'colsample_bytree': 0.8049514766470095, 
          'base_score': 2.0729023397932815, 
          'eta': 1.146933698699281, 'gamma': 3.1491135525631537, 'max_depth': 8, 'min_child_weight': 60, 
          'grow_policy': 'lossguide', 'sample_type': 'uniform', 
          'normalize_type': 'forest', 'rate_drop': 0.0001559101552440383, 'skip_drop': 0.006540833684514242}

# customizd metric
LB = np.percentile(y_train, 15)
UB = np.percentile(y_train, 85)
def eval_metric_accuracy_customized(y_pred, y_true, sample_weight=None, LB=0, UB=np.inf, threshold=0.25):
    # cond checks if y_pred are within the tolerance range for y_true which lies between [LB, UB]
    # for y_true lies outside of [LB, UB], I less care its corresponding y_pred value.
    cond = (y_pred >= (1-threshold)*y_true) & \
            (y_pred <= (1+threshold)*y_true) & \
            (y_true > LB) & (y_true < UB)
    df_true_score_weight_1 = np.where(cond, 1.0, 0.0)
    acc = np.mean(df_true_score_weight_1)
    # acc indicate the prportion of y_pred fall within the condition of cond
    return acc
    
def eval_metric_accuracy_xgboost(sample_weight=None, LB=0, UB=1500, threshold=0.25,  **kwargs):
    def eval_metric_accuracy_xgboost_internal(predt: np.ndarray, dtrain: xgb.DMatrix) -> Tuple[str, float]:
        y_true = dtrain.get_label()
        score = eval_metric_accuracy_customized(y_pred=predt, y_true=y_true, 
                                                sample_weight=sample_weight, LB=LB, UB=UB, threshold=threshold)
        return "eval_metric_accuracy_xgboost", score
    return eval_metric_accuracy_xgboost_internal
eval_metric_accuracy_xgboost_custom = eval_metric_accuracy_xgboost(LB=LB, UB=UB)

# Experiments starts
# test 1 : 
# parame disable_default_eval_metric true
# customized metric accuracy set to `maximize`
output = xgb.train(params=param, 
                   dtrain=dtrain, 
                   num_boost_round=200,  # set a high number
                   evals=[(dtrain, "train"),(dvalid, "validation")], 
                   custom_metric=eval_metric_accuracy_xgboost_custom,
                   maximize=True,   # <---------Noticed   `custom_metric` specified and metric direction maximize True
                #    early_stopping_rounds=50,
                   verbose_eval=True
                   )

preds = output.predict(data=dvalid)
print("Test RMSE:", mean_squared_error(y_test, preds, squared=False)) # Test RMSE: 0.5912299076469258


# test 2
# parame disable_default_eval_metric true
# customized metric accuracy set to NOT `maximize`
output_maximize_accuracy_false = xgb.train(params=param, 
                   dtrain=dtrain, 
                   num_boost_round=200,
                   evals=[(dtrain, "train"),(dvalid, "validation")], 
                   custom_metric=eval_metric_accuracy_xgboost_custom,
                   maximize=False, # <---------Noticed   `custom_metric` specified and metric direction maximize False
                   verbose_eval=True
                   )

preds_output_maximize_accuracy_false = output_maximize_accuracy_false.predict(data=dvalid)
print("Test RMSE:", mean_squared_error(y_test, preds_output_maximize_accuracy_false, squared=False)) # Test RMSE: 0.5912299076469258

# test 3 
# parame disable_default_eval_metric false
# customized metric accuracy set to `maximize`
param_disablt_default_eval_false = {'objective': 'reg:squarederror', 
                                    'disable_default_eval_metric': False,  # <---------Noticed disable_default_eval_metric was True before, now it is False
          'tree_method': 'hist',
          'booster': 'dart', 
          'lambda': 0.5022935723779454, 'alpha': 0.0010591193559734626, 
          'subsample': 0.7443155004860621, 'colsample_bytree': 0.8049514766470095, 
          'base_score': 2.0729023397932815, 
          'eta': 1.146933698699281, 'gamma': 3.1491135525631537, 'max_depth': 8, 'min_child_weight': 60, 
          'grow_policy': 'lossguide', 'sample_type': 'uniform', 
          'normalize_type': 'forest', 'rate_drop': 0.0001559101552440383, 'skip_drop': 0.006540833684514242}

output_maximize_accuracy_true_disable_default_false = xgb.train(params=param_disablt_default_eval_false, 
                   dtrain=dtrain, 
                   num_boost_round=200,
                   evals=[(dtrain, "train"),(dvalid, "validation")], 
                   custom_metric=eval_metric_accuracy_xgboost_custom,
                   maximize=True,  # <---------Noticed   `custom_metric` specified and metric direction maximize True
                   verbose_eval=True
                   )

preds_output_maximize_accuracy_true_disable_default_false = output_maximize_accuracy_true_disable_default_false.predict(data=dvalid)
print("Test RMSE:", mean_squared_error(y_test, preds_output_maximize_accuracy_true_disable_default_false, squared=False)) # Test RMSE: 0.5912299076469258

# test 4 
# parame disable_default_eval_metric false
# customized metric accuracy set to NOT `maximize`
output_maximize_accuracy_false_disable_default_false = xgb.train(params=param_disablt_default_eval_false, 
                   dtrain=dtrain, 
                   num_boost_round=200, 
                   evals=[(dtrain, "train"),(dvalid, "validation")], 
                   custom_metric=eval_metric_accuracy_xgboost_custom,
                   maximize=False,  # <---------Noticed   `custom_metric` specified and metric direction maximize False
                   verbose_eval=True
                   )

preds_output_maximize_accuracy_false_disable_default_false = output_maximize_accuracy_false_disable_default_false.predict(data=dvalid)
print("Test RMSE:", mean_squared_error(y_test, preds_output_maximize_accuracy_false_disable_default_false, squared=False)) # Test RMSE: 0.5912299076469258