In my use case, I need the model training result exactly reproducible, however I cannot get exact model between multiple training using exactly the same data and random seed. Here are details:
- I am using pre-compiled scala spark xgb version 0.81-criteo-20180821 on CDH 2.3.1
- I set the same random seed.
- The training data is identical in the sense of row order, column order and partition.
The model.summary always shows difference for trainObjectiveHistory since 6th digits after dot.
Did I missing something? suppose I should exactly replicate numbers.