Is my True Positives metric implementation correct?


#1

Hi! I’m training a model on imbalanced dataset, and want to compute true positives rate using a custom function. I do as follows:

def true_positives_rate(preds, dtrain):
    preds = 1./(1. + np.exp(-preds))
    y_true = dtrain.get_label()
    y_pred = preds >= 0.5
    tpr = y_pred[y_true == 1].sum()/y_true.sum()
    return ('tpr', tpr)

However, when I start training an ensemble, my tpr score on validation dataset is always equal to 1:

[0] valid-error:0.249035     valid-tpr:1
[1] valid-error:0.246374     valid-tpr:1
[2] valid-error:0.257997     valid-tpr:1
[3] valid-error:0.251214     valid-tpr:1
[4] valid-error:0.221436     valid-tpr:1
[5] valid-error:0.217834     valid-tpr:1
[6] valid-error:0.216275     valid-tpr:1
[7] valid-error:0.204473     valid-tpr:1
[8] valid-error:0.205663     valid-tpr:1
[9] valid-error:0.206395     valid-tpr:1

I started to debug the function and realized that after applying logistic transformation, my predictions are always above 0.5, and therefore, every sample is predicted as a positive class.

So my question is, should I really apply this line to predictions before doing further calculations, or not?

preds = 1./(1. + np.exp(-preds))  # all values are above 0.5 now

All the references that I read tell that I should, but then the results are very strange. When I discard this line, the metric starts looking more reasonable:

[0] valid-error:0.249035     valid-tpr: 0.292759
[1] valid-error:0.246374     valid-tpr: 0.291792
[2] valid-error:0.257997     valid-tpr: 0.294506
...

Could you please help me to figure out, which implementation is correct?