Hello,

I’m building a time series model, for forecasting Sales, based on the history since 2019.

My model uses SKForecast with XGBRegressor.

I want to forecast 75 days.

My target is SALES.

I use external features to help model, transformed in 0 and 1.

I would like to understand why my final forecast have non-zero values on target variable SALES on Sundays, even when i use a external feature OPEN=0.

My dataset have this struture:

DATA| SALES| YEAR | WEEK | WEEKDAY | OPEN

ex:

The variable “WEEKDAY”= 7 means Sunday.

On dataset i have some Sundays with SALES<>0, with OPEN=1 (first 3 months for 2019). But mostly SALES=0 on Sundays, with OPEN=0.

The external feature “OPEN”=0 means that store is closed, OPEN=1 means store open.

This is my final dataset (vendas_df2) before execute model :

The exog_variables uses all the external features, except target variable (SALES)

This is the train, validation and test :

This is the parameters for model:

### Create forecaster

### ======================================

forecaster = ForecasterAutoreg(

regressor = XGBRegressor(random_state=123),

lags = 7 #24

)

### Grid search of hyperparameters and lags

### ========================================

### Regressor hyperparameters

param_grid = {

‘n_estimators’: [100, 500],

‘max_depth’: [3, 5, 10],

‘learning_rate’: [0.01, 0.1]

}

### Lags used as predictors

lags_grid = [7, 30, 48, 72, [1, 2, 3, 7, 23, 24, 25, 71, 72, 73]]

results_grid = grid_search_forecaster(

forecaster = forecaster,

y = vendas_df2.loc[:end_validation, ‘SALES’],

exog = vendas_df2.loc[:end_validation, **exog_variables**],

param_grid = param_grid,

lags_grid = lags_grid,

steps = **75**,

refit = False,

metric = ‘mean_squared_error’,

initial_train_size = int(len(data_train)),

fixed_train_size = False,

return_best = True,

verbose = False

)

### Backtesting test data

### =========================================

metric, predictions = backtesting_forecaster(

forecaster = forecaster,

y = vendas_df2[‘SALES’],

exog = vendas_df2[**exog_variables**],

initial_train_size = len(vendas_df2.loc[:end_validation]),

fixed_train_size = False,

steps = **75**,

refit = False,

metric = ‘mean_squared_error’,

verbose = False

)

print(f"Backtest error: {metric}")

This is the final result with forecast for May:

We can see that Sundays have SALES <>0.

My dataset has OPEN=0 for Sundays, so why i can’t forecast zero values for Sundays (prev=0) ?

Can you help please?

Thank you!

Jorge Gomes