Calculating probabilities with XGBoost - binary:logistic vs custom logloss give different results

I’m getting started with XGBoost in R, and am trying to match up the predictions from the binary:logistic model with what’s generated by using a custom log loss function. I’d expect the following two calls to predict to generate the same results:

require(xgboost)

loglossobj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  preds <- 1/(1 + exp(-preds))
  grad <- preds - labels
  hess <- preds * (1 - preds)
  return(list(grad = grad, hess = hess))
}

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train<-agaricus.train
test<-agaricus.test

model<-xgboost(data = train$data, label = train$label, nrounds=2,objective="binary:logistic")
preds <- predict(model,test$data)
print (head(preds))

model<-xgboost(data = train$data, label = train$label, nrounds=2,objective=loglossobj, eval_metric = "error")
preds <- predict(model,test$data)
x <- 1 / (1+exp(-preds))
print (head(x))

The model output from a custom log loss function does not have the logistic transformation 1/(1+exp(-x)) applied. However, if I do so the resulting probabilities are different between the two calls to predict:

[1] 0.2582498 0.7433221 0.2582498 0.2582498 0.2576509 0.2750908
versus

[1] 0.3076240 0.7995583 0.3076240 0.3076240 0.3079328 0.3231709

Any suggestions?

(Cross-posted from Stack Overflow)

It turns out this behaviour is due to initial conditions. xgboost implicitly assumes base_score=0.5 when calling binary:logistic or binary:logit_raw, but base_score must be set to 0.0 to replicate their output when using a custom loss function. Here, base_score is the initial prediction score of all instances.

To illustrate, the following R code generates the same predictions in all three cases:

require(xgboost)

loglossobj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  preds <- 1/(1 + exp(-preds))
  grad <- preds - labels
  hess <- preds * (1 - preds)
  return(list(grad = grad, hess = hess))
}

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train<-agaricus.train
test<-agaricus.test

model<-xgboost(data = train$data, label = train$label, objective = "binary:logistic", nround = 10, eta = 0.1, verbose=0)
preds <- predict(model,test$data)
print (head(preds))

model<-xgboost(data = train$data, label = train$label, objective = "binary:logitraw", nround = 10, eta = 0.1, verbose=0)
preds <- predict(model,test$data)
x <- 1 / (1+exp(-preds))
print (head(x))

model<-xgboost(data = train$data, label = train$label, objective = loglossobj, base_score = 0.0, nround = 10, eta = 0.1, verbose=0)
preds <- predict(model,test$data)
x <- 1 / (1+exp(-preds))
print (head(x))

which outputs

[1] 0.1814032 0.8204284 0.1814032 0.1814032 0.1837782 0.1952717
[1] 0.1814032 0.8204284 0.1814032 0.1814032 0.1837782 0.1952717
[1] 0.1814032 0.8204284 0.1814032 0.1814032 0.1837782 0.1952717

I’m having a similar problem which is not readily fixed by changing the base score of the custom loss function mode to zero. I create the same loss function, create some dummy data and train on it. The results from the built-in objectives “binary:logistic” and “reg:logistic” are materially different from using the custom objective, no matter how I set the base score. Did I misunderstand something?

The below script reproduces the problem. I am aware RMSE is not really the right metric but it shows very neatly the differences in behaviour. Differences are also noticeable in other metrics (e.g. AUC).

# Attempt to reproduce log-loss objective 

library(data.table)
library(xgboost)

# custom objective function
logloss <- function(preds, dtrain){
  
  # Get weights and labels
  labels<- getinfo(dtrain, "label")
  
  # Apply logistic transform to predictions
  preds <- 1/(1 + exp(-preds))
  
  # Find gradient and hessian
  grad <- (preds - labels)
  hess <- preds * (1-preds)
  
  return(list("grad" = grad, "hess" = hess))
}

# Generate test data
generate_test_data <- function(n_rows = 1e5, feature_count = 5){
  
  # Make targets
  test_data <- data.table(
    target = sign(runif(n = n_rows, min=-1, max=1))
  )
  
  # Add feature columns.These are normally distributed and shifted by the target
  # in order to create a noisy signal
  for(feature in 1:feature_count){
    
    # Randomly create features of the noise
    mu <- runif(1, min=-1, max=1)
    sdev <- runif(1, min=5, max=10)
    
    # Create noisy signal
    test_data[, paste0("feature_", feature) := rnorm(
      n=n_rows, mean = mu, sd = sdev)*target + target]
  }

  # Make vector of feature names
  feature_names <- paste0("feature_", 1:feature_count)
  
  # Make training matrix and labels
  split_data[["train_trix"]] <- as.matrix(split_data$train[, feature_names, with=FALSE])
  split_data[["train_labels"]] <- as.logical(split_data$train$target + 1)
  
  return(split_data)
}

# Build the tree
build_model <- function(split_data, objective, params = list()){
  
  # Make evaluation matrix
  train_dtrix <-
    xgb.DMatrix(
      data = split_data$train_trix, label = split_data$train_labels)
  
  # Train the model
  model <- xgb.train(
    data = train_dtrix,
    watchlist = list(
      train = train_dtrix),
    nrounds = 5,
    objective =  objective,
    eval_metric = "rmse",
    params = params
  )

  return(model)
}

split_data <- generate_test_data()
cat("\nUsing built-in binary:logistic objective.\n")
test_1 <- build_model(split_data, "binary:logistic")
cat("\nUsing built-in reg:logistic objective.\n")
test_2 <- build_model(split_data, "reg:logistic")
cat("\n\nUsing custom objective\n")
test_3 <- build_model(split_data, logloss, params = list(base_score = 0.0))

This produces the following output:

Using built-in binary:logistic objective.
[1]	train-rmse:0.476833 
[2]	train-rmse:0.463433 
[3]	train-rmse:0.455049 
[4]	train-rmse:0.449588 
[5]	train-rmse:0.446047 

Using built-in reg:logistic objective.
[1]	train-rmse:0.476833 
[2]	train-rmse:0.463433 
[3]	train-rmse:0.455049 
[4]	train-rmse:0.449588 
[5]	train-rmse:0.446047 


Using custom objective
[1]	train-rmse:0.481920 
[2]	train-rmse:0.554571 
[3]	train-rmse:0.641242 
[4]	train-rmse:0.719437 
[5]	train-rmse:0.784012

I would have assumed that the custom objective produces an output pretty close to that observed for reg:logistic and binary:logistic.

Seeing that your RMSE is increasing for subsequent iterations my guess would be there’s something wrong with your gradient calculation.

As you can see in the XGB code, there are safeguards against exploding gradients.

Try printing out your gradient and see what you observe.

Hi I am also having a similar problem. Can you tell me how if you find a way to overcome that ? Thanks

Hi. Yes, I think I figured it out. When using a custom objective, there is no logistic transformation after the scores from all trees in the ensemble have been accumulated (which makes sense, since XGBoost has no idea that we are using log odds).

So in the case of the build-in objectives, the output were odds (in the range 0 to 1) while in the case of the custom objective the output was the raw logit (in the range -Inf to +Inf). This is the source for the discrepancy in the results (as I pointed out, I was quite aware that rmse was a poor measure for classification problem, I just wanted to understand why the results were different).