We trained two models by python API. One is normally trained while the other is distributely trained. For distributed training, we use first 25 lines and second 25 lines as two parts of dataset. For normal training, we use first 50 lines as dataset.
From my point of view, we should get same model if hist bin is same and all parameter relative to randomization is depressed. Dataset is small, hist bin should be same. But we got two different model. Could any one explain how could it happen.
Parameters:
params_xgb = {
‘booster’: ‘gbtree’,
‘objective’: ‘binary:logistic’,
‘eval_metric’: ‘rmse’,
‘max_depth’: 5,
‘lambda’: 0,
‘subsample’: 1.0,
‘colsample_bytree’: 1.0,
‘seed’: 123,
“tree_method”: “hist”,
“grow_policy”:“depthwise”,
“gamma”: 0,
“min_child_weight”: “0”,
}
num_boost_round =1
Datasets: FE_pima-indians-diabetes
xgboost version: 1.2.1