XGBoost: AUC VS Accuracy?

kharrmat · May 14, 2019, 2:12pm

Which one is more reliable metric between AUC and Accuracy for binary classification on imbalanced data where setScalePosWeight is set to sum(negative instances) / sum(positive instances) for training set?

On test set, the performances are:

AUC = 80%
Accuracy = 73%

I think that the Accuracy is naive in the sens that is use 0.5 as threshold
Thanks

hcho3 · May 14, 2019, 6:22pm

I suggest that you also look at AUCPR, as we want to consider precision and recall with respect to the minority class: http://www.davidsbatista.net/blog/2018/08/19/NLP_Metrics/

kharrmat · May 15, 2019, 10:42am

Thanks @hcho3 for your answer

In this example the author used randomnes to split its (imbalanced) data into train and test. This can lead to lead to cases where we have all the negative example in the train/test, … which does not preserve the distribution of the target.
Isn’t this a “split leakage” making this metrics unreliable ?

hcho3 · May 15, 2019, 11:50am

You can try using stratified sampling.

kharrmat · May 15, 2019, 12:55pm

Cool, I’ll do it. Thanks.