1.

Why missing values treatment is required?

Answer»

Missing DATA in the training data set can reduce the power/fit of a model or can LEAD to a biased model because we have not analyzed the behavior and relationship with other variables correctly. It can lead to incorrect prediction or classification. Below is a simple example to illustrate this.

Name
Weight
Gender
Play Golf or Not
AA55MYes
BB62FYes
CC58FNo
DD54
No
EE54MNo
FF66FYes
GG56
Yes
HH56MYes

Figure 1

Gender
# Count
# Play Golf
% Play Golf
F3266.67%
M3266.67%
Missing/Blank2150%

Figure 2

Please NOTE the missing values in the table shown above: in figure1, we have not treated missing values for our analysis in Figure 2. The inference from this data set is that the chances of PLAYING golf by females and males are similar.

On the other hand, if you look at Figure. 4, which shows data after treatment of missing values (based on gender), we can SEE that females have higher chances of playing cricket compared to males.

Name
Weight
Gender
Play Golf or Not
AA55MYes
BB62FYes
CC58FNo
DD54MNo
EE54MNo
FF66FYes
GG56MYes
HH56MYes

Figure 3

Gender
# Count
# Play Golf
% Play Golf
F3266.67%
M5360%

Figure 4



Discussion

No Comment Found

Related InterviewSolutions