Hello,
I’m building a time series model, for forecasting Sales, based on the history since 2019.
My model uses SKForecast with XGBRegressor.
I want to forecast 75 days.
My target is SALES.
I use external features to help model, transformed in 0 and 1.
I would like to understand why my final forecast have non-zero values on target variable SALES on Sundays, even when i use a external feature OPEN=0.
My dataset have this struture:
DATA| SALES| YEAR | WEEK | WEEKDAY | OPEN
ex:
The variable “WEEKDAY”= 7 means Sunday.
On dataset i have some Sundays with SALES<>0, with OPEN=1 (first 3 months for 2019). But mostly SALES=0 on Sundays, with OPEN=0.
The external feature “OPEN”=0 means that store is closed, OPEN=1 means store open.
This is my final dataset (vendas_df2) before execute model :
The exog_variables uses all the external features, except target variable (SALES)
This is the train, validation and test :
This is the parameters for model:
Create forecaster
======================================
forecaster = ForecasterAutoreg(
regressor = XGBRegressor(random_state=123),
lags = 7 #24
)
Grid search of hyperparameters and lags
========================================
Regressor hyperparameters
param_grid = {
‘n_estimators’: [100, 500],
‘max_depth’: [3, 5, 10],
‘learning_rate’: [0.01, 0.1]
}
Lags used as predictors
lags_grid = [7, 30, 48, 72, [1, 2, 3, 7, 23, 24, 25, 71, 72, 73]]
results_grid = grid_search_forecaster(
forecaster = forecaster,
y = vendas_df2.loc[:end_validation, ‘SALES’],
exog = vendas_df2.loc[:end_validation, exog_variables],
param_grid = param_grid,
lags_grid = lags_grid,
steps = 75,
refit = False,
metric = ‘mean_squared_error’,
initial_train_size = int(len(data_train)),
fixed_train_size = False,
return_best = True,
verbose = False
)
Backtesting test data
=========================================
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = vendas_df2[‘SALES’],
exog = vendas_df2[exog_variables],
initial_train_size = len(vendas_df2.loc[:end_validation]),
fixed_train_size = False,
steps = 75,
refit = False,
metric = ‘mean_squared_error’,
verbose = False
)
print(f"Backtest error: {metric}")
This is the final result with forecast for May:
We can see that Sundays have SALES <>0.
My dataset has OPEN=0 for Sundays, so why i can’t forecast zero values for Sundays (prev=0) ?
Can you help please?
Thank you!
Jorge Gomes