What are the ways to handle missing values in Big Data?

1.	What are the ways to handle missing values in Big Data?
Answer» There are few ways with the help of which we can handle missing value in Big Data. These are as follows. Use of Medians/Mean: All the missing values of a column can be easily filled by using the median or mean of the REMAINING values in the column in a dataset having data type as numeric. Deletion of rows that has missing values: In a dataset, we can delete rows or columns from a table that has missing values but this option is only effective or should be used when there are small NUMBER of missing values. We can delete a column if it has missing value in more than half of the table rows. Similarly, we can delete a row that has missing values in more than half of the table columns. Use of categorical data: If we can CLASSIFY data in a column then we can use categorical variable in order to fill the empty values with FREQUENTLY used values if half of the column values are empty. Predictive values: We can fill the missing values in a table if we know the nature and can predict the variable type and then fill those empty values with the predictive ones using regression techniques. Other than the above-mentioned techniques we can also use K-NN algorithm, The RandomForest algorithm, Naive Bayes algorithm, and Last Observation Carried Forward (LCOF) METHODS in order to handle missing values in Big Data.

Answer»

There are few ways with the help of which we can handle missing value in Big Data. These are as follows.

Use of Medians/Mean: All the missing values of a column can be easily filled by using the median or mean of the REMAINING values in the column in a dataset having data type as numeric.
Deletion of rows that has missing values: In a dataset, we can delete rows or columns from a table that has missing values but this option is only effective or should be used when there are small NUMBER of missing values. We can delete a column if it has missing value in more than half of the table rows. Similarly, we can delete a row that has missing values in more than half of the table columns.
Use of categorical data: If we can CLASSIFY data in a column then we can use categorical variable in order to fill the empty values with FREQUENTLY used values if half of the column values are empty.
Predictive values: We can fill the missing values in a table if we know the nature and can predict the variable type and then fill those empty values with the predictive ones using regression techniques.

Other than the above-mentioned techniques we can also use K-NN algorithm, The RandomForest algorithm, Naive Bayes algorithm, and Last Observation Carried Forward (LCOF) METHODS in order to handle missing values in Big Data.

What are the ways to handle missing values in Big Data?

Discussion

No Comment Found

Related InterviewSolutions

Reply to Comment