Dask has different prediction results between single GPU and multi GPUs

This question can refer to https://github.com/dmlc/xgboost/issues/7103.

Hi, I’m running the multi GPUs script from https://github.com/dmlc/xgboost/blob/master/demo/dask/sklearn_gpu_training.py.

I used my customized dataset and replace the X and y in the file, and run it with a single GPU and two GPUs. However, I got different results when print(prediction.compute()).

Single GPU res:
Prediction: [-9.0414286e-04 -1.7836094e-03 -3.1796098e-03 ... 9.1552734e-05 -9.7769499e-04 -4.5228004e-04]

Evaluation history: {'validation_0': {'rmse': [0.351922, 0.247238, 0.174333, 0.123821, 0.08917, 0.065837, 0.050615, 0.041147, 0.035596, 0.032528, 0.030911, 0.030084, 0.029668, 0.029458, 0.029353, 0.0293, 0.029273, 0.029256, 0.029246, 0.029239, 0.029236, 0.029231, 0.029228, 0.029225, 0.029222, 0.029219, 0.029214, 0.029212, 0.029208, 0.029206, 0.029204, 0.029202, 0.0292, 0.029197, 0.029194, 0.02919, 0.029189, 0.029187, 0.029186, 0.029184, 0.029181, 0.029179, 0.029177, 0.029174, 0.029172, 0.029168, 0.029164, 0.029161, 0.029158, 0.029156, 0.029154, 0.029152, 0.029151, 0.029149, 0.029146, 0.029144, 0.02914, 0.029139, 0.029136, 0.029134, 0.029131, 0.029129, 0.029127, 0.029125, 0.029124, 0.029122, 0.029121, 0.029119, 0.029117, 0.029115, 0.029114, 0.029112, 0.029109, 0.029108, 0.029106, 0.029104, 0.029102, 0.029101, 0.0291, 0.029097, 0.029096, 0.029094, 0.029093, 0.029092, 0.02909, 0.029089, 0.029087, 0.029085, 0.029083, 0.029082, 0.02908, 0.029079, 0.029078, 0.029077, 0.029075, 0.029073, 0.029072, 0.029071, 0.029069, 0.029068]}}

Multi GPUs res:
Prediction: [-0.00647187 0.00055468 -0.00303644 ... 0.00143129 -0.0087744 -0.00789708]

Evaluation history: {'validation_0': {'rmse': [0.351922, 0.247238, 0.174333, 0.12382, 0.089169, 0.065837, 0.050617, 0.041149, 0.0356, 0.032532, 0.030914, 0.030088, 0.029671, 0.029462, 0.029358, 0.029304, 0.029275, 0.029259, 0.029251, 0.029244, 0.029239, 0.029235, 0.029229, 0.029224, 0.02922, 0.029216, 0.029214, 0.029211, 0.029205, 0.029203, 0.029201, 0.029199, 0.029197, 0.029194, 0.029192, , 0.029189, 0.029188, 0.029185, 0.029182, 0.029178, 0.029176, 0.029175, 0.029172, 0.02917, 0.029168, 0.029166, 0.029164, 0.029159, 0.029158, 0.029156, 0.029154, 0.029152, 0.02915, 0.029147, 0.029145, 0.029143, 0.029141, 0.029139, 0.029138, 0.029136, 0.029134, 0.029132, 0.02913, 0.029129, 0.029126, 0.029125, 0.029123, 0.029121, 0.029119, 0.029117, 0.029115, 0.029113, 0.029112, 0.029111, 0.029109, 0.029107, 0.029106, 0.029104, 0.029103, 0.029102, 0.029099, 0.029097, 0.029095, 0.029094, 0.029092, 0.029091, 0.029089, 0.029086, 0.029085, 0.029084, 0.029083, 0.029082, 0.02908, 0.029078, 0.029075, 0.029074, 0.029073, 0.029072, 0.02907, 0.029068]}}

The only difference is using one GPU or two GPUs controlled by CUDA_VISIBLE_DEVICES. Looks like the histories are matched but the predictions are totally different. Could anyone help me with this why the prediction results are totally different. Thanks!