|
Answer» Response: There are different approaches to transform variables during feature engineering process. Three methods are explained below: - Binning approach – variables can be classified or categorized USING this approach. This is performed on original values, percentile or frequency of respective variables. Business understanding, goal, objectives are needed to decide on these categorization techniques.
- For example, we can classify income categories in 3 categories, such as High, Average and LOW. Anybody with annual income let's say up to 500,000 are into Low category, 500,001 to 20,00,000 falls into Average category and > 20,00,000 falls into High category and so on as an example.
- We can also perform co-variate binning, that depends on the value of more than one variable.
- Log transformation – Log value of a variable is the standard transformation method which is used to change the shape of the distribution of the variable on the particular distribution LIST. This is GENERALLY used for reducing negative skewness of variables. Histograms can be plotted for based kurtosis, MEAN and standard deviation values, log transformation can be decided.
- Square root or Cube root etc – The square root of a variable is used to have a sound effect on variable distribution. It is not significant compared to Log transformation. Cube root of the variable is used for transformation where it can be applied to negative values including zero.
|