InterviewSolution
| 1. |
Explain the data cleaning process. |
|
Answer» There is always the possibility of duplicate or mislabeled data when combining multiple data sources. Incorrect data leads to unreliable outcomes and algorithms, even when they appear to be correct. THEREFORE, consolidation of multiple data representations as WELL as elimination of duplicate data become essential in order to ensure accurate and consistent data. Here comes the importance of the data cleaning process. Data cleaning can also be referred to as data scrubbing or data CLEANSING. This refers to the process of removing incomplete, duplicate, corrupt, or incorrect data from a dataset. As the need to integrate multiple data sources becomes more apparent, for example in data warehouses or federated database systems, the significance of data cleaning increases greatly. Because the specific steps in a data cleaning process will VARY depending on the dataset, developing a TEMPLATE for your process will ensure that you do it correctly and consistently. |
|