XGBRegressor can predict zero output basead on external variable?

Hello,
I’m building a time series model, for forecasting Sales, based on the history since 2019.
My model uses SKForecast with XGBRegressor.

I want to forecast 75 days.
My target is SALES.

I use external features to help model, transformed in 0 and 1.

I would like to understand why my final forecast have non-zero values on target variable SALES on Sundays, even when i use a external feature OPEN=0.

My dataset have this struture:
DATA| SALES| YEAR | WEEK | WEEKDAY | OPEN

ex:
image

The variable “WEEKDAY”= 7 means Sunday.
On dataset i have some Sundays with SALES<>0, with OPEN=1 (first 3 months for 2019). But mostly SALES=0 on Sundays, with OPEN=0.

The external feature “OPEN”=0 means that store is closed, OPEN=1 means store open.

This is my final dataset (vendas_df2) before execute model :

The exog_variables uses all the external features, except target variable (SALES)

This is the train, validation and test :

This is the parameters for model:

Create forecaster

======================================

forecaster = ForecasterAutoreg(
regressor = XGBRegressor(random_state=123),
lags = 7 #24
)

Grid search of hyperparameters and lags

========================================

Regressor hyperparameters

param_grid = {
‘n_estimators’: [100, 500],
‘max_depth’: [3, 5, 10],
‘learning_rate’: [0.01, 0.1]
}

Lags used as predictors

lags_grid = [7, 30, 48, 72, [1, 2, 3, 7, 23, 24, 25, 71, 72, 73]]

results_grid = grid_search_forecaster(
forecaster = forecaster,
y = vendas_df2.loc[:end_validation, ‘SALES’],
exog = vendas_df2.loc[:end_validation, exog_variables],
param_grid = param_grid,
lags_grid = lags_grid,
steps = 75,
refit = False,
metric = ‘mean_squared_error’,
initial_train_size = int(len(data_train)),
fixed_train_size = False,
return_best = True,
verbose = False
)

Backtesting test data

=========================================

metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = vendas_df2[‘SALES’],
exog = vendas_df2[exog_variables],
initial_train_size = len(vendas_df2.loc[:end_validation]),
fixed_train_size = False,
steps = 75,
refit = False,
metric = ‘mean_squared_error’,
verbose = False
)

print(f"Backtest error: {metric}")

This is the final result with forecast for May:

We can see that Sundays have SALES <>0.

My dataset has OPEN=0 for Sundays, so why i can’t forecast zero values for Sundays (prev=0) ?

Can you help please?
Thank you!

Jorge Gomes

Must ignore this “On dataset i have some Sundays with SALES<>0, with OPEN=1 (first 3 months for 2019). But mostly SALES=0 on Sundays, with OPEN=0”.
I have always zero SALES in my dataset on Sundays with OPEN=0