Hi! I’m training a model on imbalanced dataset, and want to compute true positives rate using a custom function. I do as follows:
def true_positives_rate(preds, dtrain): preds = 1./(1. + np.exp(-preds)) y_true = dtrain.get_label() y_pred = preds >= 0.5 tpr = y_pred[y_true == 1].sum()/y_true.sum() return ('tpr', tpr)
However, when I start training an ensemble, my
tpr score on validation dataset is always equal to 1:
 valid-error:0.249035 valid-tpr:1  valid-error:0.246374 valid-tpr:1  valid-error:0.257997 valid-tpr:1  valid-error:0.251214 valid-tpr:1  valid-error:0.221436 valid-tpr:1  valid-error:0.217834 valid-tpr:1  valid-error:0.216275 valid-tpr:1  valid-error:0.204473 valid-tpr:1  valid-error:0.205663 valid-tpr:1  valid-error:0.206395 valid-tpr:1
I started to debug the function and realized that after applying logistic transformation, my predictions are always above
0.5, and therefore, every sample is predicted as a positive class.
So my question is, should I really apply this line to predictions before doing further calculations, or not?
preds = 1./(1. + np.exp(-preds)) # all values are above 0.5 now
All the references that I read tell that I should, but then the results are very strange. When I discard this line, the metric starts looking more reasonable:
 valid-error:0.249035 valid-tpr: 0.292759  valid-error:0.246374 valid-tpr: 0.291792  valid-error:0.257997 valid-tpr: 0.294506 ...
Could you please help me to figure out, which implementation is correct?