Imbalanced Dataset: Difference between the two ways to improve


I refer to

The page says that “there are two ways to improve [the model]” which depends on what you are trying to improve:

If you care only about the overall performance metric (AUC) of your prediction
If you care about predicting the right probability

What is the difference in these cases? When would you prefer the one over the other?


If you correct data imbalance via assigning data weights, you will introduce a bias to the predicted probability, i.e. the predicted probability for the minority class will be over-estimated.