Custom tweedie regression objective

c-pletinckx · May 23, 2023, 2:31pm

Hello,

It has been a while I am investigating an error I have when trying to implement the reg:tweedie objective myself as a custom objective function in Python and I couldn’t find the solution, neither in other topics, neither in the source code of the library.

The issue I encounter is that my results are terrible with the custom implementation on my dataset but are good on the same dataset with the built-in implementation.

My custom implementation is the following (forcing rho = tweedie_variance_power at 1.5):

def custom_tweedie_grad(
    y_true: np.ndarray,
    y_pred: np.ndarray,
) -> np.ndarray:
    a = -y_true * np.exp((1.0 - 1.5) * y_pred)
    b = np.exp((2.0 - 1.5) * y_pred)
    grad = a + b
    return grad

def custom_tweedie_hessian(
    y_true: np.ndarray,
    y_pred: np.ndarray,
) -> np.ndarray:
    a = -y_true * (1.0 - 1.5) * np.exp((1.0 - 1.5) * y_pred)
    b = (2.0 - 1.5) * np.exp((2.0 - 1.5) * y_pred)
    hess = a + b
    return hess

def custom_tweedie_objective(
    y_true: np.ndarray,
    y_pred: np.ndarray,
):
    return (
        custom_tweedie_grad(y_true, y_pred),
        custom_tweedie_hessian(y_true, y_pred),
    )

I then create a regressor with this custom objective function and call the fit method like this:

self.xgb_model = xgb.XGBRegressor(
    booster=self.booster,
    validate_parameters=self.validate_parameters,
    learning_rate=self.learning_rate,
    n_estimators=self.n_estimators,
    seed=self.seed,
    subsample=self.subsample,
    colsample_bytree=self.colsample_bytree,
    objective=custom_tweedie_objective,
    eval_metric="tweedie-nloglik@1.5",
    min_child_weight=self.min_child_weight,
    max_depth=self.max_depth,
    reg_lambda=self.reg_lambda,
    n_jobs=self.n_jobs
)
x_train = self.data["train"]["X"]
y_train = self.data["train"]["y"]
self.xgb_model.fit(x_train, y_train)

The results I obtain on my dataset are the following (using a custom evaluation method):

{'test_mse': 7578884.044259707,
 'test_r_squared': -0.12819942553491792,
 'train_mse': 6836209.780847064,
 'train_r_squared': -0.13481724642997617}

When learning another regressor with the exact same parameters, dataset and evaluation method but using the built-in tweedie objective, the results are the following:

{'test_mse': 4243417.3363827625,
 'test_r_squared': 0.3683211178249963,
 'train_mse': 2994838.0419956953,
 'train_r_squared': 0.5028540712950178}

Does someone know what could be wrong with my implementation ? It is supposed to be the same as the one in the source code as my implementation is based on it.

Thank you very much for your help

eckstefa · September 15, 2023, 9:29am

I can reproduce the issue. I checked the formulas and cross-checked with scikit-learns implementation for Tweedie loss and they are the same.

houlk8503 · October 20, 2023, 9:55am

You should apply np.exp() to all prediction results of your trained model and then re-evaluate.