Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

1.

What are the advantages of a cloud based data warehouse?

Answer»

Following are the advantages of a cloud-based data warehouse:

  • Total cost of ownership is low: The low cost of cloud data warehouses is one of the reasons they are becoming more popular. On-premises data warehouses necessitate high-cost technology, lengthy UPGRADES, ongoing maintenance, and outage management.
  • Increased performance and speed: To keep up with the expanding number of data sources, cloud data warehouses are crucial. Cloud data warehouses can easily and quickly integrate with additional data sources as NEEDED, and deploy the updated solution to production. Cloud data warehouses significantly improve speed and performance, allowing IT to focus on more innovative projects.
  • Enhanced Security: Cloud security engineers can CREATE and iterate on precise data-protection measures. FURTHERMORE, cloud encryption technologies such as multi-factor authentication make data transfer between regions and resources extremely safe.
  • Improved Disaster Recovery: Physical assets are not required to prepare cloud data warehouses for disasters. Instead, almost all cloud data warehouses offer asynchronous data duplication and execute automatic snapshots and backups. This data is kept across multiple nodes, allowing duplicate data to be accessed at any time without stopping present activity.
Conclusion:

In this article, we have covered the most frequently asked interview questions on data warehousing. ETL tools are often required in a data warehouse and so one can expect interview questions on ETL tools as WELL in a data warehouse interview. 

References and Resources:

Data Warehouse Concepts

Data Engineer

ETL Testing

Azure

DBMS

Data Warehouse Tools

Tableau Interview Questions

Highest Paying Jobs

2.

Differentiate between Agglomerative hierarchical clustering and Divisive clustering.

Answer»

Agglomerative hierarchical clustering : Flat clustering returns an unstructured set of clusters. On the other hand, this structure is more informative. We don't have to define the number of clusters in advance with this clustering procedure. Bottom-up algorithms start by treating each PIECE of data as a singleton cluster, then AGGLOMERATE pairs of clusters until all of them are merged into a single cluster that contains all of the data.

Divisive Clustering : This approach also eliminates the need to define the number of clusters ahead of time. It necessitates a method for breaking a cluster that contains all of the data and then recursively splitting clusters until all of the data has been split into singletons.

Following are the differences between the two :

  • When compared to agglomerative clustering, divisive clustering is more complicated since we require a flat clustering algorithm as a "SUBROUTINE" to split each cluster until each data has its own singleton cluster.
  • If we don't create a complete HIERARCHY all the way down to individual data leaves, divisive clustering is more efficient.
  • A divisive algorithm is also more precise. Without first examining the global distribution of data, agglomerative clustering makes judgments based on local patterns or neighbour points. These early DECISIONS are irreversible. When generating top-level dividing decisions, divisive clustering takes into account the global distribution of data.
3.

Differentiate between star schema and snowflake schema in the context of data warehousing.

Answer»

Following table enlists the difference between the star schema and the snowflake schema:

Star Schema Snowflake Schema 
The fact TABLES and DIMENSION tables are both contained in the star schema.The fact tables, dimension tables, and sub dimension tables are all contained in the snowflake schema.
It is a top-down model.It is, however, a bottom-up model.
The star schema takes up more room.It takes up less space.
Star schema has a low query complexity.Snowflake schema has a higher query complexity than star schema.
It is really simple to comprehend.It is tough to comprehend.
It contains less foreign keys.It has a greater number of foreign keys.
It has a lot of redundancy in its data.It has a low level of data redundancy.
The execution of QUERIES takes less time.The execution of queries takes longer than star schema.
Normalization is not EMPLOYED in the star schema.Both normalisation and denormalization are used in this.
It has a PRETTY simple design.It has a complicated design.
4.

What do you understand by data lake in the context of data warehousing? Differentiate between data lake and data warehouse.

Answer»

A Data LAKE is a large-scale storage repository for structured, semi-structured, and unstructured data. It's a LOCATION where you can save any type of data in its original format, with no restrictions on account SIZE or file size. It provides a significant amount of data for improved analytical performance and native integration.

A data lake is a huge container that looks a lot like a lake or a river. Similar to how a lake has various tributaries, a data lake has structured data, unstructured data, machine-to-machine communication, and logs flowing through in real-time.

The following table enlists the differences between data lake and data warehouse:

Data Lake Data Warehouse 
All data is stored in the data lake, regardless of its source or structure. The data is stored in its unprocessed state. When it is ready to be used, it is converted.Data extracted from transactional systems or data consisting of quantitative measures and their properties will be stored in a data warehouse. The information has been cleansed and changed.
Captures semi-structured and unstructured data in their original form from source systems.Captures structured data and ORGANISES it according to defined standards for data warehouse purposes.
The data lake is appropriate for those that perform in-depth analysis. Data scientists, for example, require advanced analytical techniques that include predictive modelling and statistical analysis.Because it is highly structured, easy to use, and understand, the data warehouse is perfect for operational users.
The cost of storing data in big data technology is less than that of storing data in a data warehouse.Data warehouse storage is more expensive and time-consuming.
The schema is usually developed after the data has been stored. This provides a great LEVEL of flexibility and convenience of data collecting, but it necessitates labour at the end of the process.Schema is usually defined before data is saved. Work is required at the start of the process, but performance, security, and integration are all advantages.
Users can access data in data lakes before it has been transformed, cleansed, or structured. In comparison to a traditional data warehouse, it allows consumers to get to their results faster.Pre-defined inquiries for pre-defined data kinds are answered by data warehouses. As a result, any updates to the data warehouse take longer.
5.

What do you mean by dimensional modelling in the context of data warehousing?

Answer»

Dimensional Modelling (DM) is a data structure technique that is specifically designed for data storage in a data warehouse. The goal of dimensional modelling is to optimise the database so that data can be retrieved more quickly. In a data warehouse, a dimensional model is used to read, summarise, and analyse numeric data such as values, balances, counts, weights, and so on. Relation models, on the other hand, are designed for adding, modifying, and deleting data in a real-time Online Transaction System.

Following are the steps that should be followed while creating a dimensional model:

  • Identifying the business process : The first step is to identify the specific business processes that a data warehouse should address. This might be Marketing, Sales, or Human Resources, depending on the organization's data ANALYTIC needs. The quality of data available for that process is also a factor in deciding which business process to use. It is the most crucial step in the Data Modeling process, and a failure here would result in a cascade of irreversible flaws.
  • Identifying the grain : The level of detail for the business problem/solution is described by the grain. It's the procedure for DETERMINING the lowest level of data in any table in your data warehouse. If a table contains sales data for each day, the granularity should be daily. Monthly granularity is defined as a table that contains total sales data for each month.
  • Identifying the dimension : Date, shop, inventory, and other nouns are EXAMPLES of DIMENSIONS. All of the data should be saved in these dimensions. The date dimension, for example, could include information such as the year, month, and weekday.
  • Identifying the fact : This stage is linked to the system's business users because it is here that they gain access to data housed in the data warehouse. The majority of the rows in the fact table are numerical values such as PRICE or cost per unit.
  • Building the schema : The Dimension Model is implemented in this step. The database structure is referred to as a schema (arrangement of tables).
6.

What do you mean by data purging in the context of data warehousing?

Answer»

Data PURGING is a term that DESCRIBES techniques for permanently erasing and removing data from a storage space. Data purging, which is typically contrasted with data deletion, involves a variety of procedures and techniques. 

Purging removes data permanently and frees up memory or storage space for other purposes, whereas deletion is commonly THOUGHT of as a temporary preference. Automatic data purging features are one of the methods for data cleansing in database administration. Some Microsoft products, for example, feature an automatic purge strategy that uses a circular buffer mechanism, in which older data is purged to create room for fresh data. Administrators must manually REMOVE data from the database in other CIRCUMSTANCES.

7.

Differentiate between a data warehouse and a data mart.

Answer»

FOLLOWING table enlists the difference between a data warehouse and a data mart:

Data Warehouse Data Mart 
A data warehouse is a huge collection of data gathered from several departments or groups inside a company.A data mart is a Data Warehouse's single subtype. It is created to fulfill the requirements of a certain user group.
It aids in STRATEGIC decision-making.It aids in the making of tactical business decisions.
The process of designing a Data Warehouse is fairly challenging.The Data Mart design procedure is simple.
Data warehousing involves a big portion of the company, which is why it takes so LONG to process.Because they can only handle tiny amounts of data, data marts are simple to use, create, and install.
The primary goal of a Data Warehouse is to create a unified environment and a CONSISTENT view of the business at any given moment in TIMEA data mart is primarily used at the department level in a business division.
When opposed to data mart, the data kept in the Data Warehouse is always detailed. Data Marts are designed for certain user groups. As a result, the data is brief and limited.
Data is collected from a variety of sources in a data warehouse.Data in Data Mart comes from a limited number of sources.
The Data Warehouse might be anywhere from 100 GB to 1 TB+ in size.Data Mart is less than 100 GB in size.
The time it takes to implement a Data Warehouse might range from months to years.The Data Mart implementation process is only a few months long.
From the perspective of the end-users, the data stored is read-only.The transaction data is provided straight from the Data Warehouse.
8.

What are the advantages and disadvantages of the bottom up approach of data warehouse architecture?

Answer»

Following are the advantages of the bottom up approach :

  • The reports are GENERATED quickly since the data marts are CREATED first.
  • We can fit a GREATER number of data marts here, allowing us to expand our data warehouse.
  • In addition, the cost and effort required to build this MODEL are quite minimal.

Because the dimensional view of data marts is not consistent as it is in the top-down approach, this model is not as strong as the top-down approach and this is a DISADVANTAGE of the bottom up approach.

9.

What are the advantages and disadvantages of the top down approach of data warehouse architecture?

Answer»

Following are the advantages of the TOP down APPROACH :

  • Because data marts are formed from data warehouses, they have a consistent dimensional perspective.
  • This methodology is also thought to be the most effective for corporate reforms. As a result, large corporations CHOOSE to take this method.
  • It is simple to create a data MART from a data warehouse.

The disadvantage of the top down approach is that the COST, time, and effort required to design and maintain it are all very expensive.

10.

Explain the architecture of a data warehouse.

Answer»

A data warehouse is a single schema that organizes a heterogeneous collection of multiple data sources. There are two TECHNIQUES to building a data warehouse. They are as follows:

Top-Down Approach in Data Warehouse:

Following are the major components :

  • External Sources - An external source is a location from which data is collected, REGARDLESS of the data format. Structured, semi-structured, and UNSTRUCTURED data are all possibilities.
  • Stage Area - Because the data gathered from external sources does not follow a specific format, it must be validated before being loaded into the data warehouse. ETL tool is used for this purpose in the stage area.
  • Data-warehouse - After data has been cleansed, it is kept as a central repository in the data warehouse. The meta data is saved here, while the real data is housed in data marts. In this top-down approach, the data warehouse stores the data in its purest form.
  • Data Marts - A data mart is a storage component as well. It maintains information about a single organization's function that is managed by a single authority. Depending on the functions, an organization can have as many data marts as it wants. 
  • Data Mining - Data mining is the process of ANALYZING large amounts of data in a data warehouse. With the use of a data mining algorithm, it is used to discover hidden patterns in databases and data warehouses.

Bottom Up Approach in Data Warehouse:

Following are the steps involved in the bottom up approach:

  • The data is first gathered from external sources (same as happens in top-down approach).
  • The data is then imported into data marts rather than data warehouses after passing through the staging area (as stated above). The data marts are built first, and they allow for reporting. It focuses on a specific industry.
  • After that, the data marts are incorporated into the data warehouse.