Tweedie regression objective function in R

jmpanfil · August 11, 2020, 7:22pm

I’m trying to implement the Tweedie regression objective function in R (so that I can test some changes later). The c++ implementation is shown here. Specifically:

 bst_float grad = -y * expf((1 - rho) * p) + expf((2 - rho) * p);
          bst_float hess =
              -y * (1 - rho) * \
              std::exp((1 - rho) * p) + (2 - rho) * expf((2 - rho) * p);
          _out_gpair[_idx] = GradientPair(grad * w, hess * w);

My attempt:

tw_obj <- function(preds, dtrain, rho = 1.5){
labels <- getinfo(dtrain, “label”)
w <- getinfo(dtrain, “weight”)
#labels = log(labels + .00000000001)
#preds = log(preds + .00000000001)
grad <- -labels * exp((1 - rho) * preds) + exp((2 - rho) * preds)
hess <- -labels * (1 - rho) * exp((1 - rho) * preds) + (2 - rho) * exp((2 - rho) * preds)

return(list(grad = grad * w, hess = hess * w))
}

Using this function without the log transformations results in inf values and training fails. Using the function with the log transformation results in the likelihood never improving (calculated with eval_metric = “tweedie-nloglik@1.5”).

Is there a different transformation used in the c++ code that I’m not seeing? Or does the expf function handle the large values on its own? How I can fix this in R?

jmpanfil · August 14, 2020, 1:03am

Updating on this. After consulting the github code and checking the paper describing the method, I am pretty sure my function is very close except the predictions do need a log transform.

tw_obj <- function(preds, dtrain, rho = 1.5){
labels <- getinfo(dtrain, “label”)
w <- getinfo(dtrain, “weight”)
preds = log(preds)
grad <- -labels * exp((1 - rho) * preds) + exp((2 - rho) * preds)
hess <- -labels * (1 - rho) * exp((1 - rho) * preds) + (2 - rho) * exp((2 - rho) * preds)

return(list(grad = grad * w, hess = hess * w))
}

This function does appear to work somewhat. However, the rate at which the loss function decreases is extremely slow, compared to the builtin Tweedie loss function.

For example, with an eta of .05, the builtin function drops very quickly at first.

[1] train-tweedie-nloglik@1.5:86.101196 test-tweedie-nloglik@1.5:87.459602
[51] train-tweedie-nloglik@1.5:79.192932 test-tweedie-nloglik@1.5:80.400581
[101] train-tweedie-nloglik@1.5:78.815887 test-tweedie-nloglik@1.5:80.078201

Compare that to my custom R function with eta .5

[1] train-tweedie-nloglik@1.5:87.601913 test-tweedie-nloglik@1.5:88.990860
[51] train-tweedie-nloglik@1.5:86.997467 test-tweedie-nloglik@1.5:88.419487

It is updating much slower. Any idea why?