Hi! I’m training a model on imbalanced dataset, and want to compute true positives rate using a custom function. I do as follows:
def true_positives_rate(preds, dtrain):
preds = 1./(1. + np.exp(-preds))
y_true = dtrain.get_label()
y_pred = preds >= 0.5
tpr = y_pred[y_true == 1].sum()/y_true.sum()
return ('tpr', tpr)
However, when I start training an ensemble, my tpr
score on validation dataset is always equal to 1:
[0] valid-error:0.249035 valid-tpr:1
[1] valid-error:0.246374 valid-tpr:1
[2] valid-error:0.257997 valid-tpr:1
[3] valid-error:0.251214 valid-tpr:1
[4] valid-error:0.221436 valid-tpr:1
[5] valid-error:0.217834 valid-tpr:1
[6] valid-error:0.216275 valid-tpr:1
[7] valid-error:0.204473 valid-tpr:1
[8] valid-error:0.205663 valid-tpr:1
[9] valid-error:0.206395 valid-tpr:1
I started to debug the function and realized that after applying logistic transformation, my predictions are always above 0.5
, and therefore, every sample is predicted as a positive class.
So my question is, should I really apply this line to predictions before doing further calculations, or not?
preds = 1./(1. + np.exp(-preds)) # all values are above 0.5 now
All the references that I read tell that I should, but then the results are very strange. When I discard this line, the metric starts looking more reasonable:
[0] valid-error:0.249035 valid-tpr: 0.292759
[1] valid-error:0.246374 valid-tpr: 0.291792
[2] valid-error:0.257997 valid-tpr: 0.294506
...
Could you please help me to figure out, which implementation is correct?