I’m experiencing some strange behavior with DMatrix.slice
. Can’t share data of course but I can demonstrate. I have a DMatrix
that is switching or mixing weights and labels when I slice it with a numpy array.
When I look at the whole DMatrix
, the results are as expected:
dmat_1.get_weight().min(), dmat_1.get_weight().max()
Out[161]: (1.0, 36.0)
dmat_1.get_label().min(), dmat_1.get_label().max()
Out[165]: (0.0, 29166.666)
However, when I slice with a numpy array, weights and labels are clearly getting mixed, and some nonsense thrown in too.
# weights
dmat_1.slice(train_indexes_1).get_weight().min(), dmat_1.slice(train_indexes_1).get_weight().max(),\
dmat_1.slice(test_indexes_1).get_weight().min(), dmat_1.slice(test_indexes_1).get_weight().max()
Out[162]: (0.0, 29166.666, 0.0, 29166.666) # what?!
dmat_1.slice(train_indexes_1).get_label().min(), dmat_1.slice(train_indexes_1).get_label().max(),\
dmat_1.slice(test_indexes_1).get_label().min(), dmat_1.slice(test_indexes_1).get_label().max()
Out[166]: (-6.656522e-24, 29166.666, -6.656522e-24, 3.6346553e+34) # train looks normal but test label has gone crazy
I’ve made sure train_indexes_1
and test_indexes_1
have values within expected bounds.
What is going on here?