AUC evaluation when using XGBoost-spark


#1

I have a stupid question, I use BinaryClassificationEvaluator of spark2.3 to calculate AUC of the validation set with a XGBoost-spark 0.80 model.
I need to setRawPredictionCol when using BinaryClassificationEvaluator.
When I use the column “prediction”, I get an AUC value much lower than using the column “probability” (Both column “prediction” and “probability” are shown in the dataframe when using XGBoost-spark model transform function call)
Which column should I use? Thanks


#2

Take a look at both columns. Does “prediction” column represent class predictions (0 or 1)?


#3

Hi hcho3, yes, in xgboost-spark 0.80, the “prediction” column is class predictions (0 or 1), and judged by whether the probability score of label 1 is greater than 0.5


#4

@roy1985715 Yes, in that case, you should use “probability” column for AUC calculation.