Imbalanced Dataset: Difference between the two ways to improve


#1

I refer to https://xgboost.readthedocs.io/en/latest/tutorials/param_tuning.html

The page says that “there are two ways to improve [the model]” which depends on what you are trying to improve:

Firstly:
If you care only about the overall performance metric (AUC) of your prediction
Secondly:
If you care about predicting the right probability

What is the difference in these cases? When would you prefer the one over the other?


#2

If you correct data imbalance via assigning data weights, you will introduce a bias to the predicted probability, i.e. the predicted probability for the minority class will be over-estimated.