1.

What is features selection?

Answer»

FEATURE selection is a process of extracting only the required features from the given Big Data. Big Data may contain a lot of features that may not be needed at a particular time during processing, so we are required to select only the features in which we are interested and do further processing.

There are several methods for features selection:

  1. Filters method
  2. Wrappers method
  3. Embedded method

Filters Method:

In this method, the selection of features is not dependent on the designated classifiers. The selection of variables for the ordering purpose, a variable ranking technique is used.

In the technique of variable ranking, we take into consideration the importance and usefulness of a feature for classification. In the filters method, to filter out the less relevant features, we can apply the ranking method before classification.

Some of the EXAMPLES of filters method are:

  1. Chi-Square Test
  2. Variance Threshold
  3. Information Gain etc.

Wrappers method:

In the wrappers method, the algorithm for feature subset selection exists as a 'wrapper' around the algorithm known as 'induction algorithm'.

The induction algorithm is considered as a 'Black Box'. It is used to produce a classifier that will be used in classifying.

It requires a heavy computation to obtain the subset of features. This is considered as a drawback of this technique.

Some of the examples of Wrappers Method are:

  1. Genetic Algorithms
  2. Recursive Feature Elimination
  3. Sequential Feature Selection

Embedded Method:

This method combines the EFFICIENCIES of the Filters method and the Wrappers method.

It is generally specific to a given learning machine. The selection of variables is usually done in the TRAINING process itself. What is learned by this method is the 'feature' that provides the most accurate to the model.

Some of the examples of Embedded Method are:

  1. L1 Regularisation Technique (such as LASSO)
  2. Ridge Regression (also known as L2 Regularisation)
  3. Elastic Net etc.

The process of feature selection simplifies machine learning models. So, it becomes easier to interpret them. It eliminates the burden of dimensionality. The generality of the model is enhanced by this technique. So, the overfitting problem gets reduced.

Thus, we get various benefits by using Feature Selection methods. Following are some of the obvious benefits:

  1. A better UNDERSTANDING of data
  2. Improved prediction performance
  3. Reduced computation time
  4. Reduced space etc.

Tools such as SAS, MATLAB, Weka also include methods/ tools for feature selection.



Discussion

No Comment Found