Hi,
I am training a BDT for binary classification of signal/background (I work in particle physics). My model (implemented in python) looks like:
import xgboost as xgb
train = xgb.DMatrix(data=train_df[features],label=train_df["label"],
missing="inf",feature_names=features,weight=(np.array(train_df['label'].array)*-0.99+1))
test = xgb.DMatrix(data=test_df[features],label=test_df["label"],
missing="inf",feature_names=features,weight=(np.array(test_df['label'].array) *-0.99+1))
param = {}
# Booster parameters
param['eta'] = 0.1 # learning rate
param['max_depth'] = 10 # maximum depth of a tree
param['subsample'] = 0.5 # fraction of events to train tree on
param['colsample_bytree'] = 0.5 # fraction of features to train tree on
# Learning task parameters
param['objective'] = 'binary:logistic' # objective function
param['eval_metric'] = 'error' # evaluation metric for cross validation
param = list(param.items()) + [('eval_metric', 'logloss')] + [('eval_metric', 'rmse')]
num_trees = 50 # number of trees to make
booster = xgb.train(param,train,num_boost_round=num_trees)
The model is doing well, but I’d like to slightly modify the cost/loss function, so that false positives are more penalized that true positives. I’m looking for a selection with as few background events as possible, even if it means sacrifice a significant portion of the signal.
From the documentation on custom objective and evaluation functions, I managed to add the tutorial case.
My question is this one: which are the standard objective and evaluation functions? (or the ones that I might have already implemented in the model, I’m still learning the basics).
As the model is already performing well, I just want to add an extra term to the already implemented functions.
For reference, versions of the packages I’m using:
python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
XGBoost version: 2.0.2
Pandas version: 2.1.4
Numpy version: 1.26.2