Hi XGB community,
I’m working on a classification problem with text, I have a high dimensional encoding of each sentence and a bunch of other categorical features as well. Is there a way I can use these features (of diff dimensions) together, given that the categorical features are well, unidimensional and the encoding vectors are 300-dimensional arrays each?
Things I’ve tried but throw errors: Creating a DMatrix using each feature as an array separately.
creating a pandas dataframe of all the features with each feature in a column.
When I try to create a DMatrix, or use the DataFrame directly, it throws me an error that :
“DataFrame.dtypes for data must be int, float, bool or category. When
categorical type is supplied, DMatrix parameter enable_categorical
must
be set to True
. Invalid columns:eng_embeddings”
where eng_embeddings is the column with 300-dimensional arrays, the type of this column in the dataframe is Object although when I print the type(df[‘eng_embedding’][0]) it is a numpy.ndarray
any help is appreciated, thanks!
Edit: also tried setting the enable_categorical
parameter to true, didn’t work for me