Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

1.

What are some of the common mistakes encountered in data modelling?

Answer»

Some of the mistakes encountered in data modelling are:

  • Building massive data MODELS: As good practices, a data model is recommended to have <=200 tables. This is because large data models are more likely to have design faults.
  • Lack of purpose: When the purpose of the business solution is not clear, then the data model generated would be incorrect as there is no means of validating the correctness of the model against the ORGANIZATION objectives.
  • Unnecessary de-normalization: Denormalization should not be done unless we have a solid business reason as it contributes to data redundancy which might increase the cost of maintenance.
  • Unnecessary surrogate KEYS: Surrogate keys are generated artificially for identifying the records. Too much use of these keys is not recommended when the natural keys can serve the purpose.
Conclusion

Data modelling is the process of developing a data model for storing in the database as per the business requirements. It ensures consistency and ensures NAMING conventions, semantics and security is followed along with ensuring the data quality. In the current era of technologies, data has become the new oil. It is very much important to ensure that the data is structured, developed, utilised and PRESENTED effectively for the growth of the organization. Due to this, data modelling has gained immense popularity and importance among database architects, software developers and business analysts.

Useful Resources:

  • Data Science Interview Questions
  • Big Data Interview Questions
  • Data Analyst Interview Questions
  • Power BI Interview Questions
2.

What is the importance of the third normal form?

Answer»

The third NORMAL form (3NF) is used for preventing data duplication and data abnormalities. A relation is said to have met the third normal form when there are no transitive dependencies for non-prime ATTRIBUTES as present in the second normal form. A 3NF relation is said to have achieved if any one of the below conditions are met for every non-trivial functional dependency A->B:

  • A- super key
  • B- prime attribute where every attribute is a part of candidate key

For a table to achieve 3NF form, it should first be in 1NF (First Normal Form) and achieve 2NF (Second Normal Form). The rows in the table should be dependent only on the KEYS. If the contents of fields apply to more than 1 primary key, then they should be put in another table.

For instance, if a patient’s records have a doctor’s phone number STORED with the patient’s details, it doesn’t meet the 3NF standards. Because the doctor’s number should be part of the doctor table for avoiding duplication in the patient table because a doctor can have more than one patient.

3.

How can you perform reverse engineering in data modelling?

Answer»

Reverse ENGINEERING is the process of building a data model from an EXISTING database. This is more beneficial in cases when an organization has an existing outdated database solution, then while performing tech modernization, instead of developing solutions from SCRATCH, it becomes more efficient to reconstruct the data model from an existing system and make tweaks to that data model as per the vision and APPLY that tweak to the current system. The STEPS in reverse engineering are summarised as follows:

4.

What are the types of dimensions in data modelling?

Answer»

There are 5 types of DIMENSIONS, they are:

  • Conformed dimension: These are the dimensions that are used in MULTIPLE tables in a database by allowing measures and FACTS to be categorized and described consistently across other facts or data marts.
  • Junk dimension: Here the ATTRIBUTES do not belong to the fact table or any existing dimension table. They are USUALLY text or are like flags indicating ‘Y’ or ‘N’ (yes or no) or true or false.
  • Role-playing dimension: Here, there are multiple relationships valid between itself and other tables. It is used for deciding how the roles can be used with other facts which do not share the same concepts.
  • Slowly changing dimension: These have attributes that change over time and also include dimensions where data is preserved or replaced in history.
  • Degenerated dimension: These dimensions do not have their dimension tables but are part of the primary key in the fact table.
5.

What is dimensional modelling?

Answer»

Dimensional Modeling is a technique of data structuring performed for optimizing the data storage and FASTER data retrieval in the data warehouse. It consists of 2 types of TABLES - FACT tables and dimension tables. The fact tables are used for storing different transactional data along with the foreign keys from the dimension tables qualifying the data.

The purpose of Dimensional modelling is to achieve faster data retrieval and not the HIGHEST degree of normalization.

6.

What Are Recursive Relationships?

Answer»

Recursive RELATIONSHIPS occur when there is a relationship between an entity and itself. These relationships are complex and would require more complex approaches of mapping the data to a schema. Consider an instance where a doctor is marked as a care provider in the health care database. Now if the doctor himself falls sick, he NEEDS to go to another doctor as a patient which results in recursive relationships. To accommodate this, we add a foreign key to the health CENTRE’s NUMBER in EVERY patient’s record. Care needs to be taken for ensuring that the recursion has an exit path in such entity relations.

7.

How can you relate CAP Theorem to database design?

Answer»

CAP(Consistency-Availability-Partition Tolerance) theorem says that a distributed system cannot GUARANTEE C, A and P simultaneously. It can at MAX provide any 2 of the 3 guarantees.

  • Consistency: This STATES that the data has to remain consistent after the execution of an OPERATION in the database. For example, post database updation, all queries should retrieve the same result.
  • Availability: The databases cannot have downtime and should be available and responsive always.
  • Partition Tolerance: The database system should be functioning despite the communication becoming unstable.

The following image REPRESENTS what databases guarantee what aspects of the CAP Theorem simultaneously. We see that RDBMS databases guarantee consistency and Availability simultaneously. Redis, MongoDB, Hbase databases guarantee Consistency and Partition Tolerance. Cassandra, CouchDB guarantees Availability and Partition Tolerance.

8.

What are the types of visual data modelling techniques?

Answer»

There are two types of visual techniques in data modelling, they are:

  • Entity-Relationship (ER) Model: This is a data modelling methodology that DESIGNS traditional databases and aids in performing data normalization by reducing data redundancy. It helps in database schema visualization of top-level data view. The data modelling architect identifies the FOLLOWING:
    • Entities and their attributes that represent objects/tables in the relational databases.
    • Relationships between the entities.
  • UML (Unified Modelling Language): UML is a general-purpose language used in database modelling, development and visualization in software ENGINEERING. It consists of different diagrams for representing the software SYSTEMS and one of them is Class Diagram. Class diagrams are close similar to ER diagrams and define the following:
    • Classes - equivalent to an entity in the relational world.
    • Class Attributes that represent the properties.
    • Class Methods that REPRESENTS the behaviour of a class.
    • Relationships - between class instances (Association/Composition/Aggregation) or between classes (Inheritance/Realization/Generalization).
9.

What are the critical relationship types in a Data Model?

Answer»

A relationship is normally used for connecting parent and child tables. There are 3 types of CRITICAL relationships in a data model, they are:

  • Identifying: In this type, a reference column of a child’s table these are connected by a thick line is part of the primary key of the parent table which helps in identifying the records. This relationship is usually replaced by a thick line. These are called identifying because the FOREIGN key can never change as it becomes part of the primary key. A child object cannot exist if there is no parent object.
  • Non-identifying: Here, the reference column of a child table is not part of the primary key of the parent. This relationship is represented by a dotted line. This relationship can be optional or mandatory depending on the requirements. It MEANS that the foreign key columns can (if optional) or cannot allow (if mandatory) NULL.
  • Self-recursive: Here, a STANDALONE column is connected to a primary key in the same table. This relationship is between different column instances of the same entity type.
10.

Explain snowflake schema database design.

Answer»

A snowflake schema is a schema that has one or more dimension tables that do not connect directly to the fact table but should be interacting with it via other dimension tables. 

The process of normalizing dimension tables in STAR schema to resemble a snowflake-like structure with the fact table at the centre is called snowflaking. It is represented in the image below:

The tables in the snowflake schema are normalized to 3NF form and each dimension LEVEL in the schema represents one level in the hierarchy. The central fact table is linked to multiple dimension tables that can be linked to other dimension tables via many-to-many or many-to-one relationships. The schema is USEFUL for enhancing the query performance as there are minimum disk requirements and the JOINING of smaller LOOKUP tables.

11.

What do you understand by Star schema?

Answer»

A schema USUALLY represents the data structure illustration and relationships between different structures. A star schema constitutes a central fact table and different DIMENSION tables connected to it. The PRIMARY key of the dimension tables is a foreign key in the central fact table. 

This schema is called a star schema because the entity-relationship diagram can be visualized as a star whose points diverge from the central fact table as shown below:

The central fact table has two column sets- one set belongs to the fact table and the others are foreign keys to the dimension tables. Dimensions symbolize one or more attributes/hierarchies that categorize data. If there are no hierarchies, they are called lists or flat dimensions. The dimensional tables are smaller in size when compared to the fact table.

For EXAMPLE, The fact tables can store the entire sales INFORMATION and the dimension tables store data about clients, products, geographic regions, channels etc that are linked as foreign key references to the Sales information fact table.

12.

What are the phases in the data modelling development cycle?

Answer»

There are 5 phases in data modelling:

  • Gather Business REQUIREMENTS: The first phase is to gather the functional requirements and what are the reporting needs from the business analysts and the end-users.
  • Conceptual Data Modeling(CDM): Then depending on the requirements, come up with a conceptual data model that includes major entities and relationships between them. This model will be used in the INITIAL Planning Phase.
  • Logical Data Modeling(LDM): Post the creation of a conceptual model, create a logical data model that describes the logical representation of the business requirements in the organization.
  • Physical Data Modeling(PDM): Here, we create a complete data model which includes required tables, columns in them, relationships between different entities and PROPERTIES for the implementation of the database.
  • Database: Based on the physical data model created, the SQL codes would be generated with the help of the data modelling TOOLS and these codes (queries) are then executed on the server for creating the database entities.
13.

What will happen if we try inserting two nulls in a column having a unique constraint on it?

Answer»

SINCE a null value is not equal to another null value, inserting two null values on a column with a unique constraint does not throw any ERROR. It SIMPLY inserts the DATA.

14.

How is SQL different from NoSQL?

Answer»
CategorySQLNoSQL
ModelFollows relational model.Follows the non-relational model.
DataDeals with structured data.Deals with semi-structured data
FlexibilitySQL follows a strict schema.NoSQL deals with DYNAMIC schema and is very flexible.
TransactionsFollows ACID (Atomicity, CONSISTENCY, Isolation, Durability) properties.Follows BASE (Basic AVAILABILITY, Soft-state, Eventual consistency) properties.
15.

What are the differences between OLTP and OLAP?

Answer»
OLTPOLAP
Online Transactional System or OLTP DEALS with a huge number of short online transactions.Online Analysis and DATA retrieving Process or OLAP deals with large data volume.
Makes USE of traditional DMBS.Makes use of huge data warehouses.
Tables in the databases are normalized.Tables are not normalized here.
The RESPONSE time is in the range of milliseconds.Takes seconds to minutes to usually get a response.
Designed for real-time business applications.Designed for data analysis measured using category and ATTRIBUTES.