1.

You have been provided with a dataset which has some missing values. What are the possible methods to treat missing values in the dataset? Briefly explain.

Answer»

Response

There are multiple ways to deal with missing values in dataset depending on the nature of missing values. 

Some of the key methods are as follows: 

  1. DELETION methods are used when there are listwise and pairwise deletions. Here nature of missing data is missing completely at random. In listwise deletion, observations are deleted where any of the variables are missing. In pairwise deletion, analysis is performed with all cases in which the variables of interest are present. 
  2. Impute data by replacing with mean/mode/ median values. Imputation is a method to fill in the missing values with estimated values. The goal is to employ known relationships that can be identified as invalid values of the data set to assist in estimating the missing values. Mean / Mode / Median imputation is one of the most FREQUENTLY used methods. It consists of replacing the missing data for a given attribute by the mean or median (quantitative attribute) or mode (QUALITATIVE attribute) of all known values of that variable. 
  3. kNN imputation – Another way is to treat using kNN imputation method. The missing values of an attribute are imputed using the given number of attributes that are most similar to the attribute whose values are missing. The similarity of two attributes is determined using a distance function. 
  4. PREDICTION model is one of the sophisticated approaches for handling missing data. Here, we create a predictive model to estimate values that will substitute the missing data.  In this case, we DIVIDE our data set into two sets: One set with no missing values for the variable and another one with missing values. 


Discussion

No Comment Found