Better Estimation of the uncertainty of model prediction


#1

Using boosted regression tree, and am looking for a methodology to better estimate the uncertainties of the prediction.

Currently I am using the eval set and getting the error for the eval set and using that.
Is there a way I can get data at the node level and use it (ie, for the node the prediction ends up in for each tree, is there a way to grab the variance in each of those end nodes). Idea behind that is that certain predictions are more certain that other predictions and I would prefer not building out another model just for uncertainty

Any thoughts on how to do this or better/different ways to estimate the uncertainty of a prediction are welcomed.

Thank you,
kahn


#2

Take a look at https://towardsdatascience.com/regression-prediction-intervals-with-xgboost-428e0a018b