Performance different in R vs. Python!

tmtc · October 9, 2020, 1:56pm

I’m experiencing different results in R and Python when I’m making a really basic model on the iris data set.
94% vs. 0.97% holdout accuracy, but for a more complex project I’m getting almost 15 percentage points difference!

For the test on the iris data set, the default parameters are the same and I’m using the same seeds.
I know the RNG is different between the languages, but is this really the reason or are there other reasons causing these differences?

hcho3 · October 9, 2020, 4:59pm

Yes, R uses its own RNG.

tmtc · October 22, 2020, 9:40am

But 15 percentage points is really a lot!
What is the RNG used for internally?

hcho3 · October 22, 2020, 6:42pm

We use RNG from the standard C++ library. Keep in mind that reproducibility is only guaranteed if you use the same language, OS, and compiler.