Possible bug? Custom objective functions not invariant to scale

barankavus · January 15, 2021, 7:54pm

The script has two losses, the squared loss L_a = (y-F(x))^2 and the same loss but with a 0.5 factor: L_b = 0.5*(y-F(x))^2 . Using L_a gets me trees with one splits (even if max_split is set to > 1), but using L_b results in trees with no splits. There should be no difference between the two since they differ only by a constant factor.

I made sure to set gamma=0 (the minimum improvement allowed for a split) and also set lambda=0 (the L_2 penalty) to zero, but still getting this issue.

The eval-rmse and the eval-error reported in lines 202 and 203 don’t match, even after taking the sqrt of the eval-error.

I tried this on R and Python , both return the same result. Is there a parameter I’m omitting? Not sure how to approach this issue. Any help would be appreciated!

    x1 = np.random.uniform(0,1,10000)
    x2 = np.random.uniform(0,1,10000)
    x3 = np.random.uniform(0,1,10000)
    y = 10*x1*x2+np.random.normal(size=10000)*x2

    x1_train = x1[:8000,]
    x2_train = x2[:8000,]
    y_train = y[:8000,]
    x1_test = x1[8000:,]
    x2_test = x2[8000:,]
    y_test = y[8000:,]

    data_train= np.asmatrix(pd.DataFrame(x1_train,x2_train))
    data_test = np.asmatrix(pd.DataFrame(x1_train,x2_train))

    def logregobj(preds, data_train):
        labels = data_train.get_label()
        grad = -2*(labels-preds)/len(labels)
        hess = grad*0 + 2/len(labels)
        return grad, hess

    def evalerror(preds, data_train):
        labels = data_train.get_label()

        return 'error', float((sum(labels - preds)**2) / len(labels))

    def logregobj2(preds, data_train):
        labels = data_train.get_label()
        grad = (-2*(labels-preds)/len(labels))/2
        hess = (grad*0 + 2/len(labels))/2
        return grad, hess

    def evalerror2(preds, data_train):
        labels = data_train.get_label()
        return 'error', (float(sum(labels - preds)**2) / len(labels))/2

    data_train = xgb.DMatrix(data_train,label=y_train)
    data_test = xgb.DMatrix(data_test,label=y_test)

    param = {'max_depth': 3, 'eta': 1, 'n_thread': 3, 'verbosity':2, 'lambda':0, 'gamma':0}
    watchlist = list([(dtest, 'eval'), (dtrain, 'train')])
    num_round = 5

    bst = xgb.train(param, dtrain, num_round, watchlist, logregobj, evalerror)

    bst = xgb.train(param, dtrain, num_round, watchlist, logregobj2, evalerror2)

hcho3 · January 16, 2021, 2:27am

You should also set min_child_weight=0, since you are scaling the Hessian value.

In general, we do not promise scale invariance when it comes to the objective function.