Hello,
I am desiging a model to incrementally learn and classify new traffic patterns from a stream of my toy cluster observability data. When testing XGBoost for training continuation with code similar to the one below, I am facing an error due to the new features introduced.
So my question is that why XGBoost cannot support incremental training with new features and labels? And I also saw some previous discussions saying XGBoost doesn’t guarantee performance with training continuation. issue #3055
Is it because Gradient Boosting is based on residuals and training continuation won’t update the existing trees?
Thank you.
import xgboost as xgb
from sklearn.datasets import load_wine
a = load_wine()
data = xgb.DMatrix(a['data'][:, 0:6], a['target']) # data and data2 have different features.
data2 = xgb.DMatrix(a['data'][:, 6:], a['target'])
param = {"max_depth": 2, "eta": 1, "objective": "multi:softmax", 'num_class': 3}
num_round = 2
bst = xgb.train(param, data, num_boost_round=num_round)
bst2 = xgb.train(param, data2, 1, xgb_model=bst) # error raised at this line
../src/learner.cc:1506: Check failed: learner_model_param_.num_feature == p_fmat->Info().num_col_ (6 vs. 7) : Number of columns does not match number of features in booster.