|
Answer» Below are different types of missing values can occur while the data collection process. - Missing values completely at random - If the probability of missing VARIABLE is the same across all observations, then it falls into this category. For example students determine that they will declare their preference whether to go to a cultural FESTIVAL or not after tossing a fair coin. If a head occurs, then will declare that they will either go or do not decide to go and vice versa. Each observation has an equal chance of missing value whether to go or not go.
- Missing values at random - This is different than “a” mentioned above. If the variable is missing at random and the missing ratio differs for different values of input variables, then this scenario occurs. For example: in a fair coin example setup, we have information of a set of people in a locality about their demographics, age, sex, locality TYPE - busy/very busy/moderate busy, etc and if a female has a higher missing value of other parameters compared to male.
- The missing value that depends on unobserved predictors - This case is possible when missing values are not completely at random. The phenomenon is based on unobserved input variable. Let’s say for example there is a mathematics examination and because of the complex level of examination, the expectation is that there will be fewer students who will go and appear the exam. Out of 100 students, 30 do not appear because of the “complexity level” of examination. This type of missing value is not at random. Instead, this is due to “complexity level” unless this parameter is not taken into ACCOUNT as a cause already.
- The missing value which depends on missing value itself - This is a scenario when the probability of a missing value is correlated with the missing value itself. For example Students with higher or LOWER marks in graded exam in one subject are likely to appear/disappear in competitive exam for the same subject for another purpose/competition.
|