1.

Why is data cleaning crucial? How do you clean the data?

Answer»

While running an algorithm on any data, to gather proper insights, it is very much necessary to have correct and clean data that contains only relevant information. Dirty data most often results in poor or incorrect insights and predictions which can have damaging effects.

For example, while launching any big campaign to market a product, if our data analysis tells us to target a product that in reality has no demand and if the campaign is launched, it is bound to fail. This results in a loss of the company’s revenue. This is where the importance of having proper and clean data comes into the picture.

  • Data Cleaning of the data coming from different SOURCES helps in data transformation and results in the data where the data scientists can work on.
  • Properly cleaned data increases the accuracy of the MODEL and provides very good predictions.
  • If the dataset is very large, then it becomes cumbersome to run data on it. The data cleanup step takes a lot of TIME (around 80% of the time) if the data is HUGE. It cannot be incorporated with running the model. Hence, cleaning data before running the model, results in increased speed and efficiency of the model.
  • Data cleaning helps to identify and fix any structural issues in the data. It also helps in removing any duplicates and helps to maintain the consistency of the data.

The following diagram represents the advantages of data cleaning:



Discussion

No Comment Found