When the data has minor changes or no changes how to make the model to predict the same data not much difference? or means predict the results stable?
thanks !
When the data has minor changes or no changes how to make the model to predict the same data not much difference? or means predict the results stable?
thanks !
for example same data train different models I hope the prediction of different models have same results.
thanks
Try setting reg_lambda
to a high value. This should decrease the size of outputs of leaf nodes, making changes less dramatic.
Also, you can generate synthetic data where each row is a small perturbation of a row in the original dataset (label would be the same)
I am now checking
someone said for 1. seed may has effect.
But I will also try your suggestion to increase reg_lambda. What your mean large reg_lambda? I am currently setting as 0.8. If i remember correctly , in previously experiments, small reg_lambda get better prediction results and larger reg_lambda may reduce the performance.
yes, in order to want the stable , I may need loss some performance.
I will check
welcome any comments
thx
I know this is an old post, but I too hit issues around model stability (I do randomize seeds) and have lots of features. It’s problematic to build reproducible numbers. I tried adding large number of trees (thousands) and it still didn’t help. Any luck with your experiments
The model should be bit-by-bit reproducible given the same environment (GPU model, number of CPU threads). Some scenarios might have exceptions, for instance, if you are using distributed training, then the data partitioning from the framework (like dask, spark) might not be deterministic.
Feel free to open an issue you have a sample that generates non-deterministic models.
Maybe the right question to ask is when to use subsampling and when do not, the answer to this question might help to avoid using subsampling when it should not be used for data at hand.