XGBClassifier performs worse than xgboost.train() in Python. What's wrong?


#1

Model1:
dtrain=xgb.DMatrix(x_train,label=y_train)

params={‘booster’:‘gbtree’, ‘nthread’:10, ‘eta’: 0.01,‘gamma’:2, ‘max_depth’:5,‘lambda’:1, ‘alpha’:1, ‘subsample’:0.75,‘objective’: ‘binary:logistic’, ‘eval_metric’: ‘logloss’,‘seed’:2019}
plst = list(params.items())
num_rounds = 3000
model1 = xgb.train(plst, dtrain, num_rounds, evals = [(ddev,‘val’),(dtrain,‘train’)])


Model2:
model2 =XGBClassifier(booster=‘gbtree’,gamma=2, max_depth=5, learning_rate=0.01, reg_lambda=1, reg_alpha=1, n_estimators=3000, objective=‘binary:logistic’, eval_metric=‘logloss’, subsample=0.75,seed = 2019,nthread=10 )
model2.fit(x_train, y_train)


Both model1 and model2 have the same training set and parameters, model1 can predict the test_set with KS = 0.32, but model2 predict the same test_set with KS = 0.22.(python version is 3.6.5)

I checked documents but could not find the answer. I wonder if anyone knows why it happened.


#2

I find the answer. I used mode2.predict() not model2.predict_proba() to calculate the KS.

Just pay attention. model1.predict() = model2.predict_proba()[:,1]