I am working on a mortality prediction (binary outcome) problem with “base mortality probability” as my offset in the XGboost problem.
I have used gbtree booster and binary:logistic objective function. In my data data I have multiple observations/records having same X values but different offset values.
As per my understanding (please correct me, if wrong) the XGBoost under binary:logistic setup tries to fit a model of below representation. log(p/1-p) = offset + F(x). Where F(x) is optimized (for a specific loss function) using splits with various X values.
Thus, when the X values are exactly same, to get the F(x), I can use the predicted output (with outputmargin = True option) and subtract the offset from here. However, when I got the output, it turned out in the above mentioned approach, I am getting different values F(X) for a same set X. I believe the way offset is handled internally in the XGBoost is different from my understanding. Can anyone explain me this method/mathematical formulation of handlng offset.
I am specifically interested in extracting the value of F(x) (as this is additional information the model is providing) by adjusting the model prediction from the offset values.
Here are the sample codes:
x1 = runif(1000)
y1 = as.numeric(runif(1000)>.8)
y2 = as.numeric(runif(1000)>.8)
off1 = runif(1000)
off2 = runif(1000)
#stacking the data to have same X values
y = c(y1,y2)
off = c(off1,off2)
length(unique(off)) # shows unique 2000 values
length(unique(x)) # shows unique 1000 values, i.e. each X is repeated once (as expected)
fulldata = cbind.data.frame(x,y,off)
train_dMtrix = xgb.DMatrix(data = as.matrix(x),
label = y,
base_margin = off)
params_list=list(booster = “gblinear”, objective = “binary:logistic”, eta = 0.05,
max_depth= 4, min_child_weight = 10, eval_metric = ‘logloss’)
xgbmodel = xgb.train(params = params_list, data = train_dMtrix, nrounds=100)
Getting the prediction in link format
fulldata$Predicted_link = predict(xgbmodel, train_dMtrix, outputmargin = TRUE)
Assuming Predicted_link = offset + F(x), calculating F(x) for each values of X
fulldata$F_x = fulldata$Predicted_link - fulldata$off
As per my understanding, since the F(X) in purely independent of offset, the model predictions of F_x (not the predicted probability) should be exactly same for same values of x, irrespective of the corresponding offsets. Given I have 1000 distinct X values, I’m expecting 1000 distinct F_x values