I’m using Optuna for hyperparameter optimization. I wrote the following two objective functions, one using sklearn’s cross_val_score
, the other using the xgb.cv
method. I found that the first one was significantly faster, even though they pretty much do the same thing. I’m curious why that’s the case.
Using sklearn
:
def optuna_objective(trial, ML_model, X, y, seed):
if ML_model == "XGB":
param = {
'n_estimators': trial.suggest_int('n_estimators', 100, 500),
'max_depth': trial.suggest_int('max_depth', 3, 10),
'learning_rate': trial.suggest_float('learning_rate', 1e-8, 1.0, log=True),
'subsample': trial.suggest_float('subsample', 0.1, 1.0),
'colsample_bytree': trial.suggest_float('colsample_bytree', 0.1, 1.0),
'min_child_weight': trial.suggest_int('min_child_weight', 1, 6),
'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 1.0, log=True),
'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 1.0, log=True),
}
model = XGBRegressor(**param, random_state=seed)
score = cross_val_score(model, X=X, y=y, scoring="r2", n_jobs=-1, cv=10, verbose=0)
return score.mean()
Using xgb.cv
def optuna_objective(trial, ML_model, X, y, seed):
if ML_model == "XGB":
param = {
'max_depth': trial.suggest_int('max_depth', 3, 10),
'learning_rate': trial.suggest_float('learning_rate', 1e-8, 1.0, log=True),
'subsample': trial.suggest_float('subsample', 0.1, 1.0),
'colsample_bytree': trial.suggest_float('colsample_bytree', 0.1, 1.0),
'min_child_weight': trial.suggest_int('min_child_weight', 1, 6),
'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 1.0, log=True),
'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 1.0, log=True),
'seed': seed,
'nthread': -1
}
num_boost_round = trial.suggest_int('num_boost_round', 100, 500)
dtrain = xgb.DMatrix(X, label=y)
cv_results = xgb.cv(param, dtrain, num_boost_round=num_boost_round,
nfold=10, stratified=False,
seed=seed)
return cv_results['test-rmse-mean'].values[-1]