I was using a custom log-cosh objective function for a study in which I’m interested in absolute error rather than square error, and despite what I believe was good tuning, the mean absolute error (MAE) of a log-cosh-fit model always seems to be a little worse than the MAE of a model fit with reg:squarederror
. Experimenting, I found that the built-in reg:pseudohubererror
(which, like a log-cosh objective, is motivated as being a differentiable substitute for absolute error) seems to behave similarly. I was able to construct a toy example in which XGBoost does particularly badly with psuedo-Huber loss. Things did not seem to improve with tuning or when using a separate test set (omitted for simplicity). Here’s a reproducible version in R:
library(xgboost)
rmse = function(x, y)
sqrt(mean((x - y)^2))
mae = function(x, y)
mean(abs(x - y))
set.seed(5)
N = 1000
x = rep(c(0L, 1L, 10L), len = N)
y = x^2 + rnorm(N)^2
for (loss in c("reg:squarederror", "reg:pseudohubererror"))
{m = xgboost::xgboost(
verbose = 0,
params = list(objective = loss),
data = matrix(x),
label = y,
nrounds = 50)
p = predict(m, newdata = matrix(x))
message("RMSE ", loss, " - ", rmse(y, p))
message("MAE ", loss, " - ", mae(y, p))}
The result is:
RMSE reg:squarederror - 1.46276476010163
MAE reg:squarederror - 0.999394763853888
RMSE reg:pseudohubererror - 51.5824812474803
MAE reg:pseudohubererror - 33.1663704182267
Simplifying further, consider this trivial example:
x = c(0L, 1L)
y = x
for (loss in c("reg:squarederror", "reg:pseudohubererror"))
{m = xgboost::xgboost(
verbose = 0,
params = list(
lambda = 0, eta = 1,
objective = loss),
data = matrix(x),
label = y,
nrounds = 1)
p = predict(m, newdata = matrix(x))
message("Predictions for ", loss, ": ")
print(p)}
This prints:
Predictions for reg:squarederror:
[1] 0 1
Predictions for reg:pseudohubererror:
[1] 0.5 0.5
So under reg:squarederror
, XGBoost can reproduce the input with one tree, as expected, but not, mysteriously, under reg:pseudohubererror
.
What’s going on? Are there underlying statistical reasons that XGBoost can’t handle absolute loss well? Or is there a bug?