Feature Selection and Decomposition

2.4. Feature Selection and Decomposition#

  • Select some subsets feature, in which one of them make performance of one of models is the best. Depend on the model or algorithms, there is one of some subset in proportion to model.

1. Why need to use FS ?

  • The simple models are easier to interpret

  • Shorted to traning

  • Reduce overfitting

  • Easier to deploy

  • Reducce risk of variable error of collection and storage

  • Avoid use variables with highly correlation

  • Poor performance in some type of model: tree-base

2. Methods

  • Filter methods

    • Basis

      • Constanst

      • Quasi - constanst

      • Duplicated

    • Statistics measures

      • Fisher score

      • Univariate methods

      • Mutual information

    • Correlation

  • Wrapped methods

    • Step forward selection

    • Step backward selection

    • Exhaustive search

    • Feature suffling

  • Embedded methods

    • Lasso

    • Decision tree derived importance

    • Regression coefficients

  • Hybrid methods = wrapped + embedded methods