XGBoost custom & default objective and evaluation functions

rodralva · March 4, 2024, 4:13pm

Hi,
I am training a BDT for binary classification of signal/background (I work in particle physics). My model (implemented in python) looks like:

import xgboost as xgb
train = xgb.DMatrix(data=train_df[features],label=train_df["label"],
                    missing="inf",feature_names=features,weight=(np.array(train_df['label'].array)*-0.99+1))
test = xgb.DMatrix(data=test_df[features],label=test_df["label"],
                   missing="inf",feature_names=features,weight=(np.array(test_df['label'].array)  *-0.99+1))

param = {}


# Booster parameters
param['eta']              = 0.1 # learning rate
param['max_depth']        = 10  # maximum depth of a tree
param['subsample']        = 0.5 # fraction of events to train tree on
param['colsample_bytree'] = 0.5 # fraction of features to train tree on

# Learning task parameters
param['objective']   = 'binary:logistic' # objective function
param['eval_metric'] = 'error'           # evaluation metric for cross validation
param = list(param.items()) + [('eval_metric', 'logloss')] + [('eval_metric', 'rmse')]


num_trees = 50  # number of trees to make
booster = xgb.train(param,train,num_boost_round=num_trees)

The model is doing well, but I’d like to slightly modify the cost/loss function, so that false positives are more penalized that true positives. I’m looking for a selection with as few background events as possible, even if it means sacrifice a significant portion of the signal.

From the documentation on custom objective and evaluation functions, I managed to add the tutorial case.

My question is this one: which are the standard objective and evaluation functions? (or the ones that I might have already implemented in the model, I’m still learning the basics).

As the model is already performing well, I just want to add an extra term to the already implemented functions.

For reference, versions of the packages I’m using:

python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
XGBoost version: 2.0.2
Pandas version: 2.1.4
Numpy version: 1.26.2