Is there a way to add my own custom sampler function to XGBoost ? I see there’s subsample parameter which controls the amount of training rows being selected. However, I’d like to experiment with different sampling methods to ensure data selected is IID. Is there a way to do it ?
No, currently we only provide for uniform sampling. You can however create bootstrap samples yourself with your custom sampling method and then feed it into XGBoost.
To clarify, if I create my own sample and train on each of those sample, I would need to combine them later right ? For example, say I train on samples of 100 rows
Model 0 = XGBoost[0…100]
Model 1 = XGBoost[101…200]
Final model would be average of all the predictions from [model0, model1…]. Correct ?
Depends on what you want to do. For example, you can use the bootstrapping method to generate a confidence interval for the prediction of XGBoost model. See https://en.wikipedia.org/wiki/Bootstrapping_(statistics)