DMatrix settings

hi, I trained a model in gpu with these codes:
X_train = np.array(features.fillna(0)) (some np.nan in features)
dtrain = xgb.DMatrix(X_train)

when predict, the predictions are very different among three settings:
1.X_test = np.array(features.fillna(0))
dtest = xgb.DMatrix(X_test)

2.X_test = np.array(features.fillna(0))
dtest = xgb.DMatrix(X_test, missing=0)

3.X_test = np.array(features)
dtest = xgb.DMatrix(X_test)

my question is: How does DMatrix treat 0 values in python?
if it treat 0 as 0, the result of 2 should be the same as 1
if it treat 0 as np.nan, the result of 3 should be the same as 1
but the 3 results are very different with each other.

By default, XGBoost treats only np.nan as the missing value. Zero is not a missing value. This behavior can be changed by explicitly setting missing parameter.

Not true, since in (2), you set missing=0, telling XGBoost that all zeros in dtest matrix are to be treated as missing. You should use (1), not (2), to be consistent with how you set the dtrain matrix.

1 Like

Thanks for reply,
I understand now. I thought “missing=0” means replacing a missing value with 0, like np.fillna, and that is wrong.
Thanks again!