Different results in predictions with numpy array and list

markli123 · November 8, 2020, 2:24pm

hi,
I trained a model using xgb.train() with xgboost 0.9.0.
Then I found the predictions are different whether I passed a numpy.array or a list to DMatrix.

(1)a1 = [[0]*178]
a1 = np.array(a1)
b1 = xgboost.DMatrix(a1)
print (model.predict(b1))

(2)a1 = [[0]*178]
b1 = xgboost.DMatrix(a1)
print (model.predict(b1))

the result of (1) is [-0.0454635]
while the result of (2) is [-0.02057922]

now I convert the model to a “.so” file and use it in C++ to predict. The only result I can get is (2). Since there is no numpy or dmatrix in C++(I guess), is there any ways I can get result (1) in C++? Thanks!

hcho3 · November 8, 2020, 3:50pm

Have you tried using XGBoost 1.2.0?
Please help us troubleshoot, by posting the model. Otherwise, we lack sufficient information to find the root cause.

markli123 · November 9, 2020, 5:29am

In XGBoost 1.2.0, it will raise an error: “TypeError: Input data can not be a list.”, so I can not compare the result.
do you need the training source code or the model file? May I have your email address and send the model to you?
Thanks.

hcho3 · November 9, 2020, 5:50am

@markli123 Hi, I actually found the cause of the weird behavior. See https://github.com/dmlc/xgboost/pull/3970. The list gets silently converted into scipy.sparse.csr_matrix, which causes unexpected behavior causing wrong prediction. So in your example, (1) is the correct answer. This is why we later disallowed converting list into a DMatrix directly.

markli123 · November 9, 2020, 6:11am

Thank you very much!

Still a little question:
I use a python package called “treelite” to convert my model to a ‘.so’ file, and the prediction in C++ is the (2) result, which is wrong. So I guess it is treelite’s problem?

hcho3 · November 9, 2020, 7:23am

Yes, please file a new issue in the Treelite repository.