Inconsistent number of samples in Sklearn with XGBoost

I am trying to train a XGBRegressor using this code:

import xgboost as xgb
from sklearn.metrics import mean_squared_error

def xgboost(): 
    model = xgb.XGBRegressor(n_estimators=200,
                             eval_metric=mean_squared_error, # mean_squared_error
    return model

num_amostras = x_train.shape[0]
val_size = 0.2
num_amostras_train = int(num_amostras * (1-val_size))
x_train_xgb = x_train[:num_amostras_train]
y_train_xgb = y_train[:num_amostras_train]
x_val_xgb = x_train[num_amostras_train:]
y_val_xgb = y_train[num_amostras_train:]
model_xgb = xgboost(), y_train_xgb, eval_set=[(x_train_xgb, y_train_xgb), (x_val_xgb, y_val_xgb)])
resultados = model_xgb.evals_result()

x_train has shape (1458, 55)
x_train_xgb has shape (1166, 55)
y_train_xgb has shape (1166, 24)
x_val_xgb has shape (292, 55)
y_val_xgb has shape (292, 24)

But i am getting this error:

Traceback (most recent call last):

  File ~\PeDFurnas\lib\site-packages\spyder_kernels\ in compat_exec
    exec(code, globals, locals)

  File c:\users\ldsp_\sipredvs\scripts\, y_train_xgb, eval_set=[(x_train_xgb, y_train_xgb),(x_val_xgb, y_val_xgb)])

  File ~\PeDFurnas\lib\site-packages\xgboost\ in inner_f
    return func(**kwargs)

  File ~\PeDFurnas\lib\site-packages\xgboost\ in fit
    self._Booster = train(

  File ~\PeDFurnas\lib\site-packages\xgboost\ in inner_f
    return func(**kwargs)

  File ~\PeDFurnas\lib\site-packages\xgboost\ in train
    if cb_container.after_iteration(bst, i, dtrain, evals):

  File ~\PeDFurnas\lib\site-packages\xgboost\ in after_iteration
    score: str = model.eval_set(evals, epoch, self.metric, self._output_margin)

  File ~\PeDFurnas\lib\site-packages\xgboost\ in eval_set
    feval_ret = feval(

  File ~\PeDFurnas\lib\site-packages\xgboost\ in inner
    return func.__name__, func(y_true, y_score)

  File ~\PeDFurnas\lib\site-packages\sklearn\metrics\ in mean_squared_error
    y_type, y_true, y_pred, multioutput = _check_reg_targets(

  File ~\PeDFurnas\lib\site-packages\sklearn\metrics\ in _check_reg_targets
    check_consistent_length(y_true, y_pred)

  File ~\PeDFurnas\lib\site-packages\sklearn\utils\ in check_consistent_length
    raise ValueError(

ValueError: Found input variables with inconsistent numbers of samples: [27984, 1166]

So 27984 = 1166*24 (the product of y_train_xgb shape).

1166 is the number of sample of both x_train_xgb and y_train_xgb.

If i don’t use a sklearn metric ( mean_squared_error in this case), and use the default metric of XGBoostRegressor ( 'rmse' ), the code runs just fine.

So, what is the cause of this problem here? How to fix and use mean_squared_error as eval_metric ?

The use case is currently not supported, according to For now, please use the built-in metrics.

1 Like