|
Answer» There are few ways with the help of which we can handle missing value in Big Data. These are as follows. - Use of Medians/Mean: All the missing values of a column can be easily filled by using the median or mean of the REMAINING values in the column in a dataset having data type as numeric.
- Deletion of rows that has missing values: In a dataset, we can delete rows or columns from a table that has missing values but this option is only effective or should be used when there are small NUMBER of missing values. We can delete a column if it has missing value in more than half of the table rows. Similarly, we can delete a row that has missing values in more than half of the table columns.
- Use of categorical data: If we can CLASSIFY data in a column then we can use categorical variable in order to fill the empty values with FREQUENTLY used values if half of the column values are empty.
- Predictive values: We can fill the missing values in a table if we know the nature and can predict the variable type and then fill those empty values with the predictive ones using regression techniques.
Other than the above-mentioned techniques we can also use K-NN algorithm, The RandomForest algorithm, Naive Bayes algorithm, and Last Observation Carried Forward (LCOF) METHODS in order to handle missing values in Big Data.
|