Does the gbtree prediction match between R xgboost and Python xgboost?


#1

I replaced the xgboost script implemented in R with Python.
I was expecting to match the results predicted by the R script.
However, I am wondering that there is a considerable divergence in the prediction results of Python replaced with the prediction results learned with R Script.
Just generate a training data DMatrix, train (), and then predict () the prediction data.
Of course, the training data and the prediction data are the same, and the parameters are as follows.

param = {
    'booster': 'gbtree'
    , 'objective': 'multi: softmax'
    , 'eval_metric': 'merror'
    , 'gamma': 0
    , 'eta': 0.3
    , 'max_depth': 6
    , 'min_child_weight': 1
    , 'colsample_bytree': 0.9
    , 'subsample': 0.8
    , 'alpha': 1
    , 'num_class': 8
    , 'nthread': multiprocessing.cpu_count () -1
}

The version of xgboost package is 0.90.

Is it impossible to get the same result with R and Python implementations even if they are almost the same implementation, eliminating randomness?

Postscript

Python Code

x_train = pd.read_csv("./test/x_train.csv")
y_train = pd.read_csv("./test/y_train.csv")

np.random.seed(0) # シードを固定

xd_train = xgb.DMatrix(
    data = x_train
    ,label = y_train
)
# model parameter
# https://xgboost.readthedocs.io/en/latest/parameter.html
param = {
    'booster'           : 'gbtree'
    ,'objective'        : 'multi:softmax'
    ,'eval_metric'      : 'merror'
    ,'gamma'            : 0
    ,'eta'              : 0.3
    ,'max_depth'        : 6
    ,'min_child_weight' : 1
    ,'colsample_bytree' : 0.9
    ,'subsample'        : 0.8
    ,'alpha'            : 1
    ,'num_class'        : 8
}

# Xgboost Learning
bst_fit_down = xgb.train(param,
                        xd_train,
                        num_boost_round = 500                        
)
print(xd_train.get_label())
print(xd_train.get_weight())
print(xd_train.get_base_margin())
print(xd_train.num_row())
bst_fit_down.dump_model('./test/model_P.txt', with_stats = True)

R Code

x_train <- fread("./test/x_train.csv", stringsAsFactors = F)
y_train <- fread("./test/y_train.csv", stringsAsFactors = F, header = T)

# シードを固定
set.seed(0)
# data table matrix
dt_train <- data.table(x_train, keep.rownames=F)
## 素性をsparse matrix形式に変換
smat_train <- sparse.model.matrix( ~ ., data = x_train)

xd_train <- xgb.DMatrix(
  data = smat_train
  ,label = data.table(y_train, keep.rownames=F)$Y# label dataのみ
)

#並列処理(結果に影響しないことを確認済み)
require(doParallel)
registerDoParallel(detectCores()-1)
# model parameter
param <- list(
  booster           = "gbtree"
  ,objective        = "multi:softmax"
  ,eval_metric      = "merror"
  ,gamma            = 0
  ,eta              = 0.3
  ,max_depth        = 6
  ,min_child_weight = 1
  ,colsample_bytree = 0.9
  ,subsample        = 0.8
  ,alpha            = 1
  ,num_class        = 8
)

# Xgboost Learning trainとxgboostの学習結果は同じ
bst_fit_down <- xgboost(data = xd_train, 
                        nround = 500, #model.cv$best_iteration,#ベストな学習数
                        param = param
)
#bst_fit_down <- xgb.train(data = xd_train,
#                          nrounds = 500,
#                          params = param)

print(getinfo(xd_train, "label"))
print(getinfo(xd_train, "weight"))
print(getinfo(xd_train, "base_margin"))
print(getinfo(xd_train, "nrow"))
xgb.dump(bst_fit_down, fname = "./test/model_R.txt"
, dump_format = "text"
, with_stats = T)

#2

In an additional study, we dumped a learning model that had been “trained” by removing “weight” and found that there was not a small difference in the model.

If you have any information about this difference, please let me know.

Added confirmed code.

Thank you


#3

When I proceed with the investigation
It turns out that the random number inside the package is OS-dependent, and in the R implementation it is a 32-bit variable.
The problem may be due to this random number difference.

What other information can you conclude about this issue?