Squared log loss objective results in massive prediction error

kacperw · June 5, 2020, 12:54pm

I’m using the Python API XGBRegressor() (version 1.0.2) and setting the objective to ‘reg:squaredlogerror’, most other parameters with default values. When I run it on the Ames Housing dataset I get a massive RMSLE of 7.572. However, when I change the objective to ‘reg:squarederror’ I get a more reasonable RMSLE of 0.138. At least in theory, squared log loss should optimise the RMSLE metric better than squared loss. Am I missing something obvious or is it a bug? I checked and targets or predictions are not negative so this couldn’t account for the problem.

xgb_model_params = {
‘objective’: ‘reg:squaredlogerror’,
‘booster’: ‘gbtree’,
‘random_state’: 69,
‘n_estimators’: 100,
‘learning_rate’: 0.05,
‘tree_method’: ‘hist’,
‘verbosity’: 1,
}

hcho3 · June 7, 2020, 3:44am

It could be the case that optimal hyperparameters are different for reg:squaredlogerror than for reg:squarederror. You should use GridSearchCV or Optuna to perform hyperparameter search.

kacperw · June 7, 2020, 12:25pm

Thanks for your reply Philip. Following your suggestion, I run some experiments with GridSearchCV and it seems that while for ‘reg:squarederror’ a small number of trees (n_estimators) was fine, it is not the case for ‘reg:squaredlogerror’. With a very large number of trees (10,000) the RMSLE drops from ~ 7 to ~3 so better but still far from ideal (0.12 with reg:squarederror for comparison). It looks like the model is learning very slow as not matter how many thousands of trees I use, the optimal learning_rate is always 1. This is probably due to the gradient getting stuck with the squared log error loss?

In fact I have never used squared log error loss before so while it makes sense for optimising RMSLE, I’m not sure how well it works with gradient descent. It doesn’t appear to be a very commonly used loss function, for instance scikit-learn’s GradientBoostingRegressor() doesn’t use it. Do you know most common use scenarios, pros and cons of squared log loss for regression?