Random Forest for Quantile Regression

quaere-verum · March 9, 2024, 6:49pm

Greetings. I have recently started using the XGBoost library for Python to work on a problem for which I want to use quantile regression. XGBoost offers this functionality, but limits the parameter “num_parallel_tree” to be 1. In other words, random forests are not supported for quantile regression.

I am wondering why this is the case. From what I’ve read about random forests, one obtains a random forest from a given tree algorithm by bootstrap aggregating both samples and features. And indeed, it is very easy to define a model which does the bagging, and trains a sequence of trees using XGBoost on each “bag”. The downside is: one cannot use parallelism this way, because a QuantileDMatrix cannot be pickled, which means Python’s multiprocessing module is of no use.

However, it seems this should be easily remedied in the C++ implementation of the algorithm. In fact, I don’t understand why random forests are currently not supported for quantile regression, given that the procedure described above is, presumably, the same for every objective function. There’s nothing special about quantile regression in this regard. So am I missing something? Why is it not possible to train a random forest for quantile regression?

As a final point: if someone could direct me to the code in the repo which currently throws the exception when trying to train a random forest with quantile regression, please let me know. I have searched for a while, but was unable to find it.

Unco3892 · May 11, 2024, 1:30pm

Same issue here, quantile regression for random forest does not work neither on the boosters nor the scikit-learn api. As you mentioned, it should be quite easy quite easy since xgboost.XGBRFRegressor inherits from xgboost.XGBRegressor