1.

How to deal with outliers?

Answer»

Outliers are observations that appear far away from the group. They diverge from the overall pattern outlined by the given SAMPLE. Due to the presence of outliers in the dataset, we can observe a drastic change in the results. There are various unfavourable effects of outliers in the DATA set. Some of the

impacts can be stated as follows:

  1. It may increase the error variance.
  2. Normality may get decreased.
  3. It may decrease the power of various statistical tests.
  4. We may get biased estimates.

Outliers must not be ignored and should be properly treated as their presence may change the basic assumptions in statistical modelling. The results may get skewed due to the presence of outliers. Before applying procedures to deal with the outliers, we should always try to reason out the presence of outliers.

If we know the reason for the presence of outliers in our dataset, we can use the methods accordingly, to deal with the outliers. The reasons for having outliers in the dataset can be as follows:

  1. Non-natural (Data Errors)
  2. Natural (True Outliers)

The non-natural reasons for outliers can be :

  1. Data Entry Errors
  2. Measurement Error
  3. Sampling Error
  4. Experimental Error
  5. Data Processing Error etc.

Natural or true outliers can be originally present in the dataset. To deal with outliers, the following approaches can be USED:

  1. Deleting observations
  2. Transformation
  3. Binning
  4. Imputing values
  5. Treating as a separate group
  6. Other statistical methods.

Trimming can also be used at the extremes/both ends to remove the outliers. Weights can also be assigned to DIFFERENT observations. Mean, MODE and Median can also be used to remove outliers. Before imputing values, we should analyze if it is a natural outlier or artificial. If the outliers present are significantly large in number, it is advisable to treat them as separate groups. We can then build corresponding models for both the groups. The output is then combined.



Discussion

No Comment Found