Prediction score from Xgboost4j is different from Xgboost python

jgan · December 10, 2021, 11:06pm

I used xgboost 1.4.2 python version XGBClassifier to train a model and save it as bst file. Then use xgboost4j (version 1.5) to load the bst model to make a prediction. missing value is set to -1.0 during prediction for both java and python.
All feature values are float values.

I noticed the prediction scores from java are different from python code for one of my models. I did not see this issue before for my other 7 xgboost models.
What could cause this issue for this particular model?

I did not specify “missing” parameter in XGBClassifier during training.
Will it fix this difference if missing=-1 is added XGBClassifier during training?

model = xgb.XGBClassifier(objective= ‘binary:logistic’,
max_depth=5,
n_estimators=550,
learning_rate=0.1,
colsample_bytree=0.8,
subsample=0.8,
missing=-1.0,
verbosity=2)

Below are some examples : expected_score is from python prediction, score is from xgboost4j prediction. The features are exactly same in each case.

positive cases
expected_score=0.8378626704 score=0.3003503978252411
expected_score=0.9956128597 score=0.9922531247138977
expected_score=0.9923802614 score=0.9889824986457825
expected_score=0.099611342 score=0.9582134485244751
expected_score=0.094096154 score=0.6722503900527954

negative cases :
expected_score=0.0173457861 score=0.6290968060493469
expected_score=0.0160140228 score=0.7672354578971863
expected_score=0.0093713803 score=0.8014734387397766