I am running an XGBRegressor
model to predict TV viewership based on the past 3-4 years of viewing behavior.
I run the model every day - so every day I add training examples as new data comes in and then retrain the model.
I have noticed that the predictions change quite significantly day over day when I am predicting on the same dataset (sometimes as high as +/- 20-30%). This seems a bit odd since I am only adding one day of data (so the equivalent of changing ~0.1% of the entire training set).
I understand trees are are local models and inherently unstable but is there any way to make the xgboost regression model more stable/robust to small changes in the training data?