Is it possible to use random forest with mini batch training?

HSKim · June 13, 2020, 1:10pm

Currently i am using random forest of sklearn to test my idea.

But as we know, sklearn only works in CPU.

So i am trying to replace it to random forest of XGBoost to use GPU

I read document of API reference and random forest in xgboost.

But i am still not clear to understand somthing. So i want to ask.

To use random forest with mini-batch in sklearn i have to initialize random forest with warm_start=True and increase n_estimator when train it.

For example,

RFT = RandomForestClassifier(n_estimators=100, random_state=1, n_jobs=-1, warm_start=True)
RFT.fit(mini_batch_data, mini_batch_label)
RFT.n_estimators += 1

But in random forest of xgboost
There is no warm_start parameter in api reference

And in example of regressor(it might be different from classifier)
in document of random forest in xgboost , XGBRFRegressor seems like using mini batch training right…?

This is example

from sklearn.model_selection import KFold

# Your code ...

kf = KFold(n_splits=2)
for train_index, test_index in kf.split(X, y):
    xgb_model = xgb.XGBRFRegressor(random_state=42).fit(
    X[train_index], y[train_index])

So i am curious that random forest of xgboost use mini batch train automatically? or is there no mini batch training?

Thank you for reading!

hcho3 · June 22, 2020, 6:19am

No, XGBoost does not support mini-batch training. It supports external memory (out-of-core) training and subsampling.