1.

We have a dataset comprising of variables having more than 30% missing values. Let’s say, for example, we have 100 variables and 16 variables have missing values of more than 30%. How will you deal with this scenario?

Answer»
  1. ASSIGN a unique category to MISSING values, who KNOWS the missing values might decipher some trend. Perform exploratory analysis to visualize and understand them better.
  2. We can remove them blatantly.
  3. We can sensibly check their distribution with the target variable, and if FOUND any pattern we’ll keep those missing values and assign them a new category while removing others.

In a nutshell, while handling missing values, we will have to understand data first and BASED on that, various mechanisms can be performed to treat them.

There is no specific rule for a particular scenario. It is data-driven and context specific.



Discussion

No Comment Found

Related InterviewSolutions