Model fit eval_metric for test data


Since my data is unbalanced, I want to use “auc” to measure the model performance. With XGBClassifier, I have the following code:

eval_set=[(X_train, y_train), (X_test, y_test)],y_train,eval_metric=[“auc”], eval_set=eval_set)

With one set of data, I got an auc score of 0.93 for (X_test, y_test). Then I wanted to compare it to sci-kit learn’s roc_auc_score() function. So I did the following:

auc=roc_auc_score(y_test, predictions)

For the same dataset, I got an auc score of 0.86. I ran a few more datasets and found the scores from roc_auc_score() are always lower than these from XGBoost’s eval_metric.

Shouldn’t they be the same? Thanks.


Can you try this with a toy dataset in scikit-learn and get the same issue?


@hcho3, the same issue exists for Pima Indians Diabetes data set. With XGBoost’s calculation, I got an auc score of 0.78. With Scikit Learn’s calculation, I got 0.71.

Note that this issue only applies to the auc calculations from my observations. With accuracy/error calculations, both yield the same values.


@vett93 Can you post the script here?


@hcho3 Please see script below:

#XGBoost model for Pima Indians dataset

from numpy import loadtxt
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score

#load data

dataset = loadtxt(‘’, delimiter=",")

#split data into X and y

X = dataset[:,0:8]
Y = dataset[:,8]

#split data into train and test sets

seed = 7
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)

#fit model no training data

model = XGBClassifier(silent=False,objective=‘binary:logistic’,n_estimators=400)
eval_set=[(X_train, y_train), (X_test, y_test)], y_train, eval_metric=[“auc”,“error”,“logloss”], eval_set=eval_set, verbose=True)

#make predictions for test data

predictions = model.predict(X_test)

#evaluate predictions

accuracy = accuracy_score(y_test, predictions)
precision=precision_score(y_test, predictions)
recall=recall_score(y_test, predictions)
print(“Accuracy: %.2f%%” % (accuracy * 100.0))
print("Precision: %.2f%% " % (precision *100))
print("Recall: %.2f%% " % (recall * 100))
print("AUC: %.2f%% " % (roc *100))


Where do I obtain


You can find it from Kaggle:


XGBoost uses probability prediction to compute AUC. So you should use predict_proba() instead of predict():

# get probabilities for positive class
predictions = model.predict_proba(X_test)[:,1]
roc = roc_auc_score(y_test, predictions)
print("AUC: %.4f%% " % (roc * 100))   # prints AUC: 78.3213% 


Why is xgb not working with my code? Here’s my code. Any help please?


I’m trying to use XGBRegressor.