Training continuation in sklearn API not working


#1

I want to be able to train my model stepwise for parameter optimization purposes (avoiding constructing all trees for unpromising parameter settings).

With the Learning API, this is possible through something like

bst = None
for i in range(100):
    # train only one step
    bst = train(params=params, dtrain=dtrain, xgb_model=bst, num_boost_rounds=1)

With the Sklearn API, xgb_model is also a valid fit parameter, but it doesn’t seem to do anything as far as I can tell.

xgb = XGBClassifier(silent=False, n_estimators=1)
xgb.fit(x, y)
print('---')
xgb.n_estimators += 1
xgb.fit(x, y, xgb_model=xgb)  # or xgb.get_booster() or 'saved_xgb.file'

Which prints:

[15:11:22] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
---
[15:11:22] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[15:11:22] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3

While

xgb = XGBClassifier(silent=False, n_estimators=2)
xgb.fit(x, y)

prints

[15:14:56] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3
[15:14:56] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 0 pruned nodes, max_depth=3

In other words, both trees get built even though the first tree already exists and xgb_model is provided. This behavior is different from train and not what one would expect from reading the documentation. Am I missing something?


#2
xgb = XGBClassifier(silent=False, n_estimators=1)
xgb.fit(x, y)
print('---')
xgb.n_estimators += 1
xgb.fit(x, y, xgb_model=xgb)  # or xgb.get_booster() or 'saved_xgb.file'

This script trains three trees (1 tree in the first model + 2 trees in the second model). Training continuation does not overwrite existing trees; it merely appends new trees to the existing ensemble.