1.

List Out Some Of The Best Practices For Data Cleaning?

Answer»

Some of the best practices for data cleaning includes:

  • Sort data by DIFFERENT attributes
  • For large datasets cleanse it stepwise and improve the data with each step until you achieve a good data quality
  • For large datasets, break them into small data. Working with less data will increase your iteration SPEED
  • To handle COMMON cleansing task create a set of utility functions/tools/scripts. It might include, remapping values based on a CSV file or SQL database or, regex search-and-replace, blanking out all values that don’t match a regex
  • If you have an issue with data cleanliness, arrange them by estimated frequency and attack the most common problems
  • Analyze the summary STATISTICS for each column ( standard deviation, mean, number of missing values,)
  • KEEP track of every date cleaning operation, so you can alter changes or remove operations if required.

Some of the best practices for data cleaning includes:



Discussion

No Comment Found