Recently, I had to deal with quite an unfortunate and difficult to figure out problem. At my place of work, we are currently trying to create XGB models on a Kubernetes cluster, and thus, certain limitations are imposed upon us (mostly memory limitations, to be more precise). To get around said limitations, I attempted to implement my own
When attempting to teach the model using a
DMatrix created from a
DataIter, I kept receiving an exception saying
feature names are not unique. We would pass the
feature_names variable to the
__init__ of the
DMatrix, yet, as I figured out later, when initializing the
DMatrix from an iterator, the
feature_names parameter is ignored completely. The fix was rather simple, all I had to do was call
set_info right after the
DMatrix was initialized, which set
feature_names to its proper value and I stopped receiving the aforementioned exception.
I know the
DataIter feature is still considered experimental, yet, I wanted to ask. Is this behavior intended, or is it a bug? The fact that
DMatrix.__init__ may ignore some parameters when initializing from an iterator is not documented anywhere. I did not notice anyone anywhere encountering the same issue. Thus, I was curious whether I was doing something wrong and
feature_names are left uninitialized for a reason that is yet elusive to me, or whether I could at least file a bug report or submit a pull request.
Thanks a lot and sorry if I’m just missing something.