Feature_weights does not work as expected

Hi. I was waiting for feature_weights, in XGBoost now we have it. However, I haven’t seen a tutorial/demo yet. I tried it by myself, it changes something, but when I plot the tree I see it doesn’t work as expected.
feature_weights ( array_like ) – Weight for each feature, defines the probability of each feature being selected when colsample is being used. All values must be greater than 0, otherwise a ValueError is thrown. Only available for hist, gpu_hist and exact tree methods.
I’ve tried to change tree_methods to see difference, but they don’t reflect to change.
I have 3 features.
1)
When I define feature_weights in that way:
feature_weights = np.array([0.8,0,0.19]).astype(np.float32)
bst = xgb.XGBRegressor(**param,tree_method= “exact”)
bst.fit(X_train,y_train, feature_weights=feature_weights,eval_set=[(X_valid, y_valid)])

I expect to not to see second feature when I plot the trees. But I see them.
2)
When I define feature_weights in that way:
feature_weights = np.array([0.8,0,0.2).astype(np.float32)
bst = xgb.XGBRegressor(**param,tree_method= “exact”)
bst.fit(X_train,y_train, feature_weights=feature_weights,eval_set=[(X_valid, y_valid)])

It gives constant result, as I see, sum of features should be less than one.
3)
When I define feature_weights in that way by increasing my number of features to 5:
feature_weights = np.array([0,0,0,0,0.999]).astype(np.float32)
bst = xgb.XGBRegressor(**param,tree_method= “exact”)
bst.fit(X_train,y_train, feature_weights=feature_weights,eval_set=[(X_valid, y_valid)])
I see that only first tree features are shown in the plots, although I set them to be zero.
4)
When I define feature_weights in that way by increasing my number of features to 5:
feature_weights = np.array([0,0,0,0,1]).astype(np.float32)
bst = xgb.XGBRegressor(**param,tree_method= “exact”)
bst.fit(X_train,y_train, feature_weights=feature_weights,eval_set=[(X_valid, y_valid)])
I see that only first tree features are shown in the plots, and results become constantly zero again.

What should I do, we at least need tutorial I think.

1 Like

Here is a demo: https://github.com/dmlc/xgboost/blob/master/demo/guide-python/feature_weights.py

1 Like

Hi and thanks for the demo

But I’m still not fully understanding the feature_weights option. What are the default feature_weights when nothing is specified? Would need to know this because of the use case where one wants to specify feature_weights for only some of features and leave the rest as default.

My use case is this: I would like to specify to XGBoost that some features should be unpenalized and always included/selected in the model with non-zero weights. For example, you could do this in glmnet with the option “penalty_factor” which is a vector the length of your features each with number between [0,1]. If you set it to 0 for a feature that feature is unpenalized and always included in the model. 1 is the default setting for each feature.

How would I accomplish the same thing in XGBoost using feature_weights? Or am I misunderstanding the functionality and it isn’t similar to penalty_factor in glmnet?

By default, all features receive weight 1. The higher the weight, it is more likely to be chosen when feature sampling is enabled. Setting a weight to zero will cause the corresponding feature to be dropped.

Currently you cannot. You must specify weights for all features. That is, when setting feature_weights, you must pass in a list that’s as long as the number of features.

1 Like

Ok, so I’m not understanding the option then, because if you specify a feature_weight of 1 for every feature by default doesn’t that mean the probability of selecting every feature is 1?

So long story short, the feature_weights option is not similar to the penalty_factor option in glmnet? Is there a way with XGBoost to specify for certain features that they should be unpenalized and always included in the model with non-zero coefficients?

No not at all. Think of feature_weights as a probability distribution. When set to all 1’s, feature_weights is equivalent to the uniform distribution. In general, the sum over all elements of feature_weights is taken and then is used as the normalization constant, so that we always have a proper probability distribution.

The probability distribution is used when feature sampling is enabled with colsample_bytree, colsample_bylevel, or colsample_bynode.

There is really no “coefficient,” since XGBoost is fitting decision trees, not linear models. In each tree’s split node, the feature is selected at random using the probability distribution provided by feature_weights (with normalization).

1 Like

Thank you for the detailed and thoughtful explanation!