Are the features equal? 如果有100个特征,预测阶段每个特征起的作用一样吗

我理解特征重要性list可以打印出来,这样我们可以看出哪些特征没意义,打印特征重要性还有其他作用吗

现在有100个特征,如果加了1个特征,我想让这个特征起绝对的大作用 是做不到的吧?这个新特征也就能影响1/101的模型结果?

we can print the feature importance list, so that we can remove the bad features.
If we have 100 features, then we add 1 very important feature(we want this one very importance), but the new feature can only affect 1/101 of the model output?

@guotong1988 Hi guotong, I think you can check the definition of conditional feature importance.

Here are some materials for you.

Strobl, Carolin, and Achim Zeileis. 2008a. “Danger: High Power! – Exploring the Statistical Properties of a Test for Random Forest Variable Importance.”

———. 2008b. “Why and How to Use Random Forest Variable Importance Measures (and How You Shouldn’t).” Dortmund: useR! http://www.statistik.uni-dortmund.de/useR-2008/slides/Strobl+Zeileis.pdf.

And when I go through all of them, I jot down some notes. I think it will help you and save your time.

Good luck.

@guotong1988

你增加的特征要看它和label的相关性,如果是相关性非常非常强,那新的特征会影响之前的特征。相当于决策树在学习的时候,在新增加的这个特征上,信息增益最大,只用一个特征就把模型学习出来了。

在实践中,其实这样做是不太好的,因为这样通常意味着过拟合,模型的鲁棒性差,在其他数据集上表现就一般。

It depends on your label data correlation, if label is very similar with new feature, which will impact other features significantly.
In practices, it tends to be over fitting. Usually, the more robot model, the more sparse and uniform feature is.

another view, it depends on how good of your data. If data has no noise I am thhinking there is no concept of ‘over-fitting’.
yes, if the data has noise i agree with you.