"feature_weights " doesn't work

hwangyj9 · May 31, 2021, 10:52am

Hi , When I use feature_weights parameter , There is no different in the results…

1)not using feature_weights parameter

model5=xgb.XGBRegressor(objective=‘reg:squarederror’, colsample_bytree = 1, n_estimators =1000,gamma=0,subsample=1,reg_alpha=0.1,tree_method= “exact”)

model5.fit(train_x,train_y)

predictions=model5.predict(test_x)

2)using feature_wieghts parameter

model5=xgb.XGBRegressor(objective=‘reg:squarederror’, colsample_bytree = 1, n_estimators =1000,gamma=0,subsample=1,reg_alpha=0.1,tree_method= “exact”)

model5.fit(train_x,train_y,feature_weights=random_weights)

predictions=model5.predict(test_x)

The two cases have no differences.
I don’t know why ‘feature_weights’ doesn’t work… Please Help…

hcho3 · June 1, 2021, 1:47am

Please set any of the colsample_* hyperparameters to a value less than 1.0. The feature weights determine how the features are randomly selected by sampling, and if sampling is disabled, setting feature weights won’t have any effect.

hwangyj9 · June 1, 2021, 4:49am

Thanks. I solved the problem!

gat_91 · February 14, 2022, 4:14pm

Hello @hcho3! I am interested in using feature_weights as well and found this post - I did set all the colsample_* hyperparamters to a value less than 1.0 but a disconnect with the # of columns being samples and their corresponding weights throws an error? Not sure how to proceed - thank you for any advice!

hcho3 · February 14, 2022, 5:03pm

It appears that you have 110 columns in your data. Please ensure that weights is 110 elements long.

mleeming · July 29, 2022, 9:49am

I don’t think that’s what’s happening here: I have also encountered this problem. I think that the set of features is 221 long, but after colsample_bytree, only 110 features are selected. Then when it tries to do sub-subsampling bylevel, the check fails since there’s only 110 remaining features.

I have replicated this problem in the below minimal example:

import numpy as np
import pandas as pd
import xgboost as xgb

total_num_columns = 32
num_rows = 100
X = pd.DataFrame({f’x_{i}’: np.random.normal(0, 1, num_rows) for i in range(total_num_columns)})
y = np.random.normal(0, 1, num_rows)
feature_weights = np.exp(np.random.normal(0, 0.5, total_num_columns))

dm = xgb.DMatrix(X, label=y, feature_weights=feature_weights)

xgb.train({‘colsample_bytree’: 0.75, ‘colsample_bynode’: 0.25}, dm)

XGBoostError: [09:47:43] …/src/common/random.h:92: Check failed: array.size() == weights.size() (24 vs. 32)

mleeming · July 29, 2022, 9:53am

The consequence of this seems that it’s impossible to use feature_weights if more than one of colsample_bytree, colsample_bynode, colsample_bylevel is set nontrivially. @hcho3 do you think would be better if xgboost passed a sub-array of just the subsampled feature_weights down to the next level of subsampling?

hcho3 · July 29, 2022, 10:26pm

The bug has been fixed in https://github.com/dmlc/xgboost/pull/8100

mleeming · July 31, 2022, 11:29pm

Thanks for the update!