1.

Suppose there is a dataset having variables with missing values of more than 30%, how will you deal with such a dataset?

Answer»

Depending on the size of the dataset, we FOLLOW the below WAYS:

  • In case the datasets are small, the MISSING values are substituted with the mean or average of the remaining data. In pandas, this can be done by USING mean = df.mean() where df represents the pandas dataframe representing the dataset and mean() calculates the mean of the data. To substitute the missing values with the calculated mean, we can use df.fillna(mean).
  • For LARGER datasets, the rows with missing values can be removed and the remaining data can be used for data prediction.


Discussion

No Comment Found