I am trying to train a XGBRegressor
using this code:
import xgboost as xgb
from sklearn.metrics import mean_squared_error
def xgboost():
model = xgb.XGBRegressor(n_estimators=200,
max_depth=4,
subsample=1,
min_child_weight=1,
objective='reg:squarederror',
tree_method='hist',
eval_metric=mean_squared_error, # mean_squared_error
early_stopping_rounds=50)
return model
num_amostras = x_train.shape[0]
val_size = 0.2
num_amostras_train = int(num_amostras * (1-val_size))
x_train_xgb = x_train[:num_amostras_train]
y_train_xgb = y_train[:num_amostras_train]
x_val_xgb = x_train[num_amostras_train:]
y_val_xgb = y_train[num_amostras_train:]
model_xgb = xgboost()
model_xgb.fit(x_train_xgb, y_train_xgb, eval_set=[(x_train_xgb, y_train_xgb), (x_val_xgb, y_val_xgb)])
resultados = model_xgb.evals_result()
x_train
has shape (1458, 55)
x_train_xgb
has shape (1166, 55)
y_train_xgb
has shape (1166, 24)
x_val_xgb
has shape (292, 55)
y_val_xgb
has shape (292, 24)
But i am getting this error:
Traceback (most recent call last):
File ~\PeDFurnas\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)
File c:\users\ldsp_\sipredvs\scripts\treinamento_demanda.py:201
model_xgb.fit(x_train_xgb, y_train_xgb, eval_set=[(x_train_xgb, y_train_xgb),(x_val_xgb, y_val_xgb)])
File ~\PeDFurnas\lib\site-packages\xgboost\core.py:729 in inner_f
return func(**kwargs)
File ~\PeDFurnas\lib\site-packages\xgboost\sklearn.py:1086 in fit
self._Booster = train(
File ~\PeDFurnas\lib\site-packages\xgboost\core.py:729 in inner_f
return func(**kwargs)
File ~\PeDFurnas\lib\site-packages\xgboost\training.py:182 in train
if cb_container.after_iteration(bst, i, dtrain, evals):
File ~\PeDFurnas\lib\site-packages\xgboost\callback.py:238 in after_iteration
score: str = model.eval_set(evals, epoch, self.metric, self._output_margin)
File ~\PeDFurnas\lib\site-packages\xgboost\core.py:2138 in eval_set
feval_ret = feval(
File ~\PeDFurnas\lib\site-packages\xgboost\sklearn.py:139 in inner
return func.__name__, func(y_true, y_score)
File ~\PeDFurnas\lib\site-packages\sklearn\metrics\_regression.py:442 in mean_squared_error
y_type, y_true, y_pred, multioutput = _check_reg_targets(
File ~\PeDFurnas\lib\site-packages\sklearn\metrics\_regression.py:100 in _check_reg_targets
check_consistent_length(y_true, y_pred)
File ~\PeDFurnas\lib\site-packages\sklearn\utils\validation.py:397 in check_consistent_length
raise ValueError(
ValueError: Found input variables with inconsistent numbers of samples: [27984, 1166]
So 27984 = 1166*24
(the product of y_train_xgb
shape).
1166
is the number of sample of both x_train_xgb
and y_train_xgb
.
If i don’t use a sklearn metric ( mean_squared_error
in this case), and use the default metric of XGBoostRegressor
( 'rmse'
), the code runs just fine.
So, what is the cause of this problem here? How to fix and use mean_squared_error
as eval_metric
?