XGBoost iteration_range defined differently in sklearn API and docs

In the xgboost sklearn.py source code they retrieve the best iteration range using this code if the model was trained with early stopping rounds.

iteration_range = (0, self.best_iteration + 1)

However in the docs they say to use this to achieve the best iteration range.

ypred = bst.predict(dtest, iteration_range=(0, bst.best_iteration))

https://xgboost.readthedocs.io/en/latest/python/python_intro.html#prediction

Why the discrepancy? Which is the correct way?

Thanks!

The source code is correct. The doc will be updated to match this. https://github.com/dmlc/xgboost/pull/7324

1 Like

Ok, good to know!
Thanks for the reply!

I find the updated version confusing.
If best_iteration = 100, I think it means optimal number of tree is 100. By the new guidance, it will take tree0-tree100 which in total is 101 trees, which doesn’t make sense.
Additionally, if best_iteration = n_trees, using this when predicting models will kill the kernel.

iteration_range = (0, self.best_iteration + 1)

Hi, best_iteration can not be equal to ntrees. best_iteration starts from 0. But indeed we should add a check to prevent the kernel from being killed.