Custom callback with early stopping: number of executions

I have noticed that the number of executions of a custom callback seems to be random when it is used with early stopping turned on.

Here is a minimal example to reproduce the behavior I am talking about:

import xgboost as xgb

# read in data
dtrain = xgb.DMatrix('agaricus.txt.train')
dtest = xgb.DMatrix('agaricus.txt.test')

# specify parameters
params_xgb = dict(

params_train = dict(

# specify callback
class ExampleCallback(xgb.callback.TrainingCallback):
    def __init__(self, callback_results):
        self.callback_results = callback_results

    def after_iteration(self, model, epoch, evals_log):
        return False

callback_results = []
params_train['callbacks'] = [ExampleCallback(callback_results), ]

bst = xgb.train(
    evals=[(dtest, 'dtest')],

num_callback_results = len(callback_results)
best_score = bst.best_score
best_iteration = bst.best_iteration
num_trees = len(bst.get_dump())
line = f'{num_callback_results};{best_score};{best_iteration};{num_trees}\n'

with open('callback.csv', mode='a') as f: 

When I run this code say 100 times, in most of the cases the length of the list callback_results is equal to bst.best_iteration + early_stopping_rounds (in my case it’s 43), but in some cases (14%) it is equal to bst.best_iteration + early_stopping_rounds + 1 (in my case it’s 44). Best score, number of iterations and number of trees in the best model are always the same (in my case they are correspondingly 0.0004998939329742, 40 and 44).

Moreover, I have noticed that when I simply run xgboost.train() with early stopping parameter the number of times the model evaluation score is printed is also random and can be 43 or 44.

Is it the expected behavior of callbacks? Is there a way to get my custom callback always execute the same number of times?

My actual need is more complex. I want to do a cross-validation with function and to store the score on the out-of-fold sets at each boosting round. I can use a callback for this, but the length of the list with evaluation scores randomly varies by 1, hence my question.

I am using xgboost ver 1.6.1.