Feature Interaction Constraint Overlap


#1

Hi,

Does anybody what are the implications of having overlaps in between different set of interactions. For example, an interaction constraint of the form:

[[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5]]

I want to do this because for my data neighbouring features are highly correlated. My thinking is that by doing this I can manage to limit model complexity by eliminating first order interactions in between distant features. But I don’t really now the actual implications for the algorithm of defining that kind of interaction. Will it simply be reduced to [0,1,2,3,4,5], and therefore achieve nothing?

Thanks!


#2

I think your approach will actually prevent interactions between distant features.

Let S the set of features used in all the ancestor nodes of the proposed split (plus itself). The proposed split is admitted whenever at least one set in the interaction constraint is a superset of S.

Now suppose we have a proposed split that would cause interactions between distant features, i.e. S = {0, 5}. Notice that none of the three lists in

is a superset of {0, 5}. Thus, the proposed split will be rejected.


#3

@GCBallesteros I think it would be beneficial to the community if we added this use case to the Feature Interaction Constraints Tutorial.


#4

Thanks @hcho3 for your response. So any interactions in between different part of the features set would only happen via the combinations of different trees in the ensemble. Is this interpretation correct?

I could try to make an example for the docs if you consider it could be useful.


#5

“Feature interactions” are narrowly defined as a traversal path in a single tree in the tutorial. So the occurrences of features in different trees are not “feature interactions” according to this definition.


#6

Yes, that would be super useful.