XGBRegressor with cross validation from sklearn is faster than using xgb.cv. Why?

I’m using Optuna for hyperparameter optimization. I wrote the following two objective functions, one using sklearn’s cross_val_score, the other using the xgb.cv method. I found that the first one was significantly faster, even though they pretty much do the same thing. I’m curious why that’s the case.

Using sklearn:

def optuna_objective(trial, ML_model, X, y, seed):
    if ML_model == "XGB":
        param = {
            'n_estimators': trial.suggest_int('n_estimators', 100, 500),
            'max_depth': trial.suggest_int('max_depth', 3, 10),
            'learning_rate': trial.suggest_float('learning_rate', 1e-8, 1.0, log=True),
            'subsample': trial.suggest_float('subsample', 0.1, 1.0),
            'colsample_bytree': trial.suggest_float('colsample_bytree', 0.1, 1.0),
            'min_child_weight': trial.suggest_int('min_child_weight', 1, 6),
            'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 1.0, log=True),
            'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 1.0, log=True),
        }
        model = XGBRegressor(**param, random_state=seed)

    score = cross_val_score(model, X=X, y=y, scoring="r2", n_jobs=-1, cv=10, verbose=0)
    return score.mean()

Using xgb.cv

def optuna_objective(trial, ML_model, X, y, seed):
    if ML_model == "XGB":
        param = {
            'max_depth': trial.suggest_int('max_depth', 3, 10),
            'learning_rate': trial.suggest_float('learning_rate', 1e-8, 1.0, log=True),
            'subsample': trial.suggest_float('subsample', 0.1, 1.0),
            'colsample_bytree': trial.suggest_float('colsample_bytree', 0.1, 1.0),
            'min_child_weight': trial.suggest_int('min_child_weight', 1, 6),
            'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 1.0, log=True),
            'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 1.0, log=True),
            'seed': seed,
            'nthread': -1
        }
        num_boost_round = trial.suggest_int('num_boost_round', 100, 500)

        dtrain = xgb.DMatrix(X, label=y)

        cv_results = xgb.cv(param, dtrain, num_boost_round=num_boost_round,
                        nfold=10, stratified=False,
                        seed=seed)

    return cv_results['test-rmse-mean'].values[-1]

Use the sklearn variant if you don’t need global early stopping.

But do you know why this is happening?