13 + Interview Questions in ETL Interview Questions for Experienced in ETL Testing Interview Questions Page 1 InterviewSolution

1.	Explain SCD (Slowly Change Dimension).
Answer» SCD (Slowly Changing Dimensions) basically keep and manage both current and historical data in a data WAREHOUSE over time. Rather than changing regularly on a time-based schedule, SCD CHANGES slowly over time. SCD is considered one of the most CRITICAL ASPECTS of ETL.

Discussion

2.	What do you mean by factless table?
Answer» Factless tables do not contain any facts or MEASURES. It CONTAINS only dimensional keys and DEALS with EVENT occurrences at the informational level but not at the calculational level. As the name implies, factless fact tables capture relationships between dimensions but lack any numerical or textual data. Factual fact tables can be categorized into two categories: one that describes events, and the other one that describes conditions. Both may have a significant impact on your dimensional modeling.

Discussion

3.	What is the benefit of using a Data reader destination adapter?
Answer» ADO Recordset holds a COLLECTION of RECORDS (records and columns) from a DATABASE table. The Data Reader Destination Adapter is very useful when it comes to populating them in a SIMPLE manner. Using the ADO.NET DataReader interface, it exposes the data in a data flow for other applications to consume it.

Discussion

4.	What do you mean by schema objects?
Answer» Generally, a schema comprises a set of database objects, such as TABLES, views, INDEXES, clusters, database links, and synonyms, etc. This is a LOGICAL description or STRUCTURE of the database. Schema objects can be arranged in various ways in schema models designed for data warehousing. STAR and snowflake schemas are two examples of data warehouse schema models.

Discussion

5.	Explain what you mean by Bus Schema.
Answer» An important part of ETL is dimension identification, and this is largely DONE by the Bus Schema. A BUS schema is actually COMPRISED of a suite of VERIFIED dimensions and uniform definitions and can be used for handling dimension identification across all businesses. To put it another way, the bus schema identifies the common dimensions and facts across all the data marts of an organization just like IDENTIFYING conforming dimensions (dimensions with the same information/meaning when being referred to different fact tables). Using the Bus schema, information is given in a standard format with PRECISE dimensions in ETL.

Discussion

6.	Explain the Snowflake schema.
Answer» Adding additional dimension tables to a Star Schema makes it a Snowflake Schema. In the Snowflake schema model, multiple hierarchies of dimension tables surround a central fact table. Alternatively, a dimension table is called a snowflake if its low-cardinality attribute has been segmented into separate NORMALIZED tables. These normalized tables are then joined with referential constraints (foreign key constraints) to the original dimensions table. Snowflake schema complexity increases linearly with the level of HIERARCHY in the dimension tables. Advantages Data integrity is reduced because of structured data. Data are highly structured, so it requires little disk space. Updating or maintaining Snowflaking tables is easy. Disadvantages Snowflake reduces the space consumed by dimension tables, but the space SAVED is USUALLY insignificant compared with the entire data warehouse. Due to the number of tables ADDED, you may need complex joins to perform a query, which will reduce query performance.

6.

Explain the Snowflake schema.

Answer»

Adding additional dimension tables to a Star Schema makes it a Snowflake Schema. In the Snowflake schema model, multiple hierarchies of dimension tables surround a central fact table. Alternatively, a dimension table is called a snowflake if its low-cardinality attribute has been segmented into separate NORMALIZED tables. These normalized tables are then joined with referential constraints (foreign key constraints) to the original dimensions table. Snowflake schema complexity increases linearly with the level of HIERARCHY in the dimension tables.

Advantages

Data integrity is reduced because of structured data.
Data are highly structured, so it requires little disk space.
Updating or maintaining Snowflaking tables is easy.

Disadvantages

Snowflake reduces the space consumed by dimension tables, but the space SAVED is USUALLY insignificant compared with the entire data warehouse.
Due to the number of tables ADDED, you may need complex joins to perform a query, which will reduce query performance.

Discussion

7.	What do you mean by staging area and write its main purpose?
Answer» During the extract, transform, and load (ETL) process, a STAGING area or landing zone is used as an intermediate storage area. It SERVES as a TEMPORARY storage area between data sources and data warehouses. Staging areas are primarily used to extract data quickly from their respective data sources, THEREFORE minimizing the impact of those sources. Using the staging area, data is combined from multiple data sources, transformed, VALIDATED, and cleaned after data has been loaded.

Discussion

8.	What do you mean by ODS (Operational data store)?
Answer» Between the staging area and the Data Warehouse, ODS SERVES as a repository for data. UPON inserting the data into ODS, ODS will load all the data into the EDW (Enterprise data warehouse). The benefits of ODS mainly pertain to business operations, as it presents current, clean data from multiple sources in ONE place. Unlike other databases, an ODS DATABASE is read-only, and customers cannot update it.

Discussion

9.	Define Grain of Fact.
Answer» Accordingly, grain FACT REFERS to the LEVEL of storing fact INFORMATION. Alternatively, it is known as Fact Granularity.

Discussion

10.	Explain what is fact and write its type.
Answer» An important aspect of data warehousing is the FACT table. A fact table basically represents the measurements, metrics, or FACTS of a business process. In fact tables, facts are stored, and they are linked to a number of dimension tables via foreign keys. Facts are usually details and/or aggregated measurements of a business process which can be calculated and grouped to address the business question. Data schemas LIKE the star schema or snowflake schema consist of a central fact table surrounded by several dimension tables. The measures or numbers like sales, cost, profit and loss, etc., are some examples of facts. Fact tables have two types of columns, foreign keys and measures columns. Foreign keys store foreign keys to dimensions, while measures contain NUMERIC facts. Other attributes can be added, depending on the business need and necessity. Types of Facts Facts can be divided into three basic types, as follows: Additive: Facts that are fully additive are the most flexible and useful. We can sum up additive facts ACROSS any dimension associated with the fact table. Semi-additive: We can sum up semi-additive facts across some dimensions associated with the fact table, but not all. Non-Additive: The Fact table contains non-additive facts, which cannot be summed up for any dimension. The ratio is an example of a non-additive fact.

10.

Explain what is fact and write its type.

Answer»

An important aspect of data warehousing is the FACT table. A fact table basically represents the measurements, metrics, or FACTS of a business process. In fact tables, facts are stored, and they are linked to a number of dimension tables via foreign keys. Facts are usually details and/or aggregated measurements of a business process which can be calculated and grouped to address the business question. Data schemas LIKE the star schema or snowflake schema consist of a central fact table surrounded by several dimension tables. The measures or numbers like sales, cost, profit and loss, etc., are some examples of facts.

Fact tables have two types of columns, foreign keys and measures columns. Foreign keys store foreign keys to dimensions, while measures contain NUMERIC facts. Other attributes can be added, depending on the business need and necessity.

Types of Facts

Facts can be divided into three basic types, as follows:

Additive: Facts that are fully additive are the most flexible and useful. We can sum up additive facts ACROSS any dimension associated with the fact table.
Semi-additive: We can sum up semi-additive facts across some dimensions associated with the fact table, but not all.
Non-Additive: The Fact table contains non-additive facts, which cannot be summed up for any dimension. The ratio is an example of a non-additive fact.

Discussion

11.	Can you define cubes and OLAP cubes?
Answer» The CUBE is one of the things on which data PROCESSING relies HEAVILY. In their simplest form, CUBES are just data processing units that contain dimensions and fact tables from the Data warehouse. It provides clients with a multidimensional view of data, querying, and analysis capabilities. On the other hand, Online Analytical Processing (OLAP) is software that allows you to analyze data from several databases at the same time. For reporting purposes, an OLAP cube can be used to store data in the multidimensional form. With the cubes, creating and viewing reports BECOMES easier, as well as smoothing and improving the reporting process. The end users are responsible for managing and maintaining these cubes, who have to manually update their data.

11.

Can you define cubes and OLAP cubes?

Answer»

The CUBE is one of the things on which data PROCESSING relies HEAVILY. In their simplest form, CUBES are just data processing units that contain dimensions and fact tables from the Data warehouse. It provides clients with a multidimensional view of data, querying, and analysis capabilities.

On the other hand, Online Analytical Processing (OLAP) is software that allows you to analyze data from several databases at the same time. For reporting purposes, an OLAP cube can be used to store data in the multidimensional form. With the cubes, creating and viewing reports BECOMES easier, as well as smoothing and improving the reporting process. The end users are responsible for managing and maintaining these cubes, who have to manually update their data.

Discussion

12.	Mention some of the ETL bugs.
Answer» FOLLOWING are a few common ETL bugs: User Interface Bug: GUI bugs include issues with color selection, font style, navigation, spelling check, etc. Input/Output Bug: This type of bug causes the application to take invalid values in place of valid ones. Boundary Value Analysis Bug: Bugs in this section check for both the minimum and maximum values. Calculation bugs: These bugs are usually mathematical errors causing incorrect results. Load Condition Bugs: A bug like this does not ALLOW multiple users. The user-accepted data is not allowed. Race Condition Bugs: This type of bug interferes with your system’s ability to FUNCTION properly and causes it to crash or hang. ECP (Equivalence Class Partitioning) Bug: A bug of this type results in invalid types. Version Control Bugs: Regression TESTING is where these kinds of bugs normally occur and does not provide version details. Hardware Bugs: This type of bug prevents the device from responding to an application as expected. Help Source Bugs: The help DOCUMENTATION will be incorrect due to this bug.

12.

Mention some of the ETL bugs.

Answer»

FOLLOWING are a few common ETL bugs:

User Interface Bug: GUI bugs include issues with color selection, font style, navigation, spelling check, etc.
Input/Output Bug: This type of bug causes the application to take invalid values in place of valid ones.
Boundary Value Analysis Bug: Bugs in this section check for both the minimum and maximum values.
Calculation bugs: These bugs are usually mathematical errors causing incorrect results.
Load Condition Bugs: A bug like this does not ALLOW multiple users. The user-accepted data is not allowed.
Race Condition Bugs: This type of bug interferes with your system’s ability to FUNCTION properly and causes it to crash or hang.
ECP (Equivalence Class Partitioning) Bug: A bug of this type results in invalid types.
Version Control Bugs: Regression TESTING is where these kinds of bugs normally occur and does not provide version details.
Hardware Bugs: This type of bug prevents the device from responding to an application as expected.
Help Source Bugs: The help DOCUMENTATION will be incorrect due to this bug.

Discussion

13.

State difference between ETL testing and manual testing.

Answer»

ETL Testing	Manual Testing
The test is an automated process, which means that no special technical knowledge is needed aside from understanding the software.	It requires technical EXPERTISE in SQL and Shell scripting since it is a manual process.
It is extremely fast and systematic, and it delivers EXCELLENT results.	In addition to being TIME-consuming, it is highly prone to errors.
Databases and their counts are central to ETL testing.	Manual testing focuses on the program's functionality.
Metadata is included and can easily be altered.	It lacks metadata, and changes require more effort.
It is concerned with error handling, LOG summary, and load progress, which eases the developer's and maintainer's workload.	From a maintenance perspective, it requires maximum effort.
It is very good at handling HISTORICAL data.	As data increases, processing time decreases.

Discussion

Explore topic-wise InterviewSolutions in .

Explain SCD (Slowly Change Dimension).

What do you mean by factless table?

What is the benefit of using a Data reader destination adapter?

What do you mean by schema objects?

Explain what you mean by Bus Schema.

Explain the Snowflake schema.

What do you mean by staging area and write its main purpose?

What do you mean by ODS (Operational data store)?

Define Grain of Fact.

Explain what is fact and write its type.

Can you define cubes and OLAP cubes?

Mention some of the ETL bugs.

State difference between ETL testing and manual testing.