I am wondering about the proper use of the validation set and watchlist feature for early stopping and performance monitoring. Doesn’t this involve leakage?
# from guide-python/basic_walkthrough.py
# specify validations set to watch performance
watchlist = [(dtest, 'eval'), (dtrain, 'train')]
num_round = 2
bst = xgb.train(param, dtrain, num_round, watchlist)
# this is prediction
preds = bst.predict(dtest)
Should it be, instead:
# specify validations set to watch performance
watchlist = [(dtrain, 'train'), (dvalid, 'val')]
num_round = 2
bst = xgb.train(param, dtrain, num_round, watchlist)
# this is prediction
preds = bst.predict(dtest) # use test set here only
Separately, the positions are reversed in two examples located on GitHub, for instance:
Which is correct? Scoring predictions are not run further in this example, but it seems to follow the “right” intuition of monitoring performance on a dev/validation set while leaving the test set only for scoring conditions.
# from /tests/python/test_eval_metrics.py
watchlist = [(dtrain, 'train'), (dvalid, 'val')]