Preformance difference between C++(C_API)/Python

I’m currently implementing my python XGB code to C++ and currently comparing the performance and result.

On C++ booster I set all the hyper-parameters as same as python XGBRegressor model and also set the objective as “reg:squarederror”. After training for 5000 Epoch without earlystopping, the rmse of the two language trained model was almost same, but not identical(target value range 0~50 and get maximum ±0.2 level prediction difference between two languages, it’s bit too large to consider as precision or floating number difference in my perspective).

I matched the version of the library v2.0.3 of xgboost and set all the datatypes float(on C++), and np.float32(Python) but still getting slightly different prediction value for the trained data and unseen test data.
Is this due to compiler and language difference only or am I missing something on the workflow of both two?

*I’ve check if it’s datatype or floating point problem by importing Python Trained model(json) on C++ and making prediction. I got almost same(nearly identical) predicted value(error range maximum ±0.00005, can ignore this level) for this.

my working environment is
Windows 10 64bit (19042.2965)
Python v3.9.13
C++ C++17 (mingw64/g++)
XGBoost v2.0.3