I’ve trained an xgboost model in R with data containing 160 variables. However, I’d like to predict on inference data with only 10 variables.
Is this possible? If yes, how does this work? Do I only give the 10 variables and the model knows which variables are given despite a different ordering? Do I have to set the missing variables to NULL
or blank or something?
How to predict data with less variables?
You need you add back those variable as missing values, that means number of columns of test and training data should be the same. Please see the missing
parameter of xgb.DMatrix.
1 Like
Thanks for your reply @jiamingy !
As far as the documentation concerned I understand the missing
parameter as filling in the missing values with a certain value. I don’t understand how this helps to add back those variables to keep the numer of columns of test and training data the same. Do you please have any more guidance?