Problem with eval_set


I’m writing a Machine Learning program using scikit-learn and xgboost libraries. The problem raises when I need the model__eval_set parameter to fit the pipeline. I split the data and preprocessed it like this:

Splitting data
X, X_val, y, y_val = train_test_split(X, y, test_size=0.1, train_size=0.9)

numerical_transformer = SimpleImputer(strategy=‘mean’)
categorical_transformer = Pipeline(steps=[(‘impute’, SimpleImputer(strategy=‘most_frequent’)), (‘label_encoding’, OneHotEncoder(handle_unknown=‘ignore’))])

numerical_cols = [cols for cols in X.columns if X[cols].dtype in (‘int64’, ‘float64’)]
categorical_cols = [cols for cols in X.columns if X[cols].dtype == ‘object’]

numerical_cols_val = [cols for cols in X_val.columns if X_val[cols].dtype in (‘int64’, ‘float64’)]
categorical_cols_val = [cols for cols in X_val.columns if X_val[cols].dtype == ‘object’]

preprocessor = ColumnTransformer(transformers=[(‘num’, numerical_transformer, numerical_cols), (‘cat’, categorical_transformer, categorical_cols)])

preprocessor_val = ColumnTransformer(transformers=[(‘num’, numerical_transformer, numerical_cols_val), (‘cat’, categorical_transformer, categorical_cols_val)])

model = XGBRegressor(n_estimator=1000, learning_rate=0.1)

pipeline = Pipeline(steps=[(‘preprocessor’, preprocessor), (‘model’, model)])

pipeline_2 = Pipeline(steps=[(‘preprocessor_val’, preprocessor_val)]), y_val)

X_new = pipeline_2.fit_transform(X_val), y, model__early_stopping_rounds=5, model__eval_set=[(X_new, y_val)])

But when I run the output is this:

ValueError: feature_names mismatch: [‘f0’, ‘f1’, ‘f2’, ‘f3’, ‘f4’, ‘f5’, ‘f6’, ‘f7’, … expected f5063, f6684, f4704, f10633, f2279, … in input data]

Could you help me?