InterviewSolution

Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

1.	What is the difference between an RDBMS and Hadoop?
Answer»

2.	What is the purpose of RecordReader in Hadoop?x
Answer»

3.	What is YARN and explain its components?
Answer»

4.	What is the port number for NameNode, Task Tracker and Job Tracker?
Answer»

5.	What is SequenceFileInputFormat? For what it is used in Hadoop?
Answer»

6.	What is rack awareness in Hadoop?
Answer»

7.	How to recover a NameNode when it is down?
Answer»

8.

What are the different schedulers available in Hadoop?

Answer»

Here are the different TYPES of SCHEDULERS AVAILABLE in HADOOP:

The FIFO SCHEDULER
The Fair Scheduler
The Capacity Scheduler

9.	What happens if two clients try writing into the same HDFS file?
Answer» In an HDFS system, when the first CLIENT contacts the NAMENODE for writing the file, NameNode grants the client to create this file. But, when the SECOND client opens the same data for writing, NameNode confirms that one client is already given access to writing the file; HENCE, it rejects the second client's open request.

10.

What is the difference between HDFS and NFS?

Answer»

Network File System (NFS)	HDFS
This is a PROTOCOL developed so that clients can access files over a standard network.	This is a file system that is distributed among multiple systems or nodes.
It ALLOWS users to access files LOCALLY even though the files reside on a network.	It is fault-tolerant, i.e., it STORES multiple replicas of files over different systems.

11.	What is active and passive NameNode in Hadoop?
Answer» ACTIVE Namenode: It is the Namnode in Hadoop, which works and RUNS INSIDE the cluster. Passive Namenode: It is a STANDBY Namenode having a similar data structure as an Active Namenode.

12.

What is the difference between the distributed file system and the Hadoop distributed file system?

Answer»

Distributed File System	Hadoop Distributed File System (HDFS)
It is PRIMARILY designed to hold a LARGE amount of data while providing access to multiple CLIENTS over a network.	It is designed to hold vast amounts of data (petabytes and terabytes) and also supports individual files having large sizes.
Here files are stored on a single machine.	Here, the files are stored over multiple machines.
It does not PROVIDE Data Reliability	It provides Datta Reliability.
If multiple clients are accessing the data at the same time, it can cause a server overload.	HDFS TAKES care of server overload very smoothly, and multiple access does not amount to server overload.

13.	What is Hadoop MapReduce used for?
Answer» In HADOOP, MAPREDUCE is a sort of PROGRAMMING framework ALLOWING users to perform DISTRIBUTED and parallel processing on extensive data sets in a controlled and distributed environment.

14.	What is data serialization in Hadoop?
Answer» The process of FORMATTING structured data such that it can be converted to its original form is known as Data Sterilization. It is carried out to TRANSLATE data structures into a STREAM of FLOWING data. This can then be transferred THROUGHOUT the network or can be stored in any Database regardless of the system architecture.

15.

What are the configuration files in Hadoop?

Answer»

Here is a list of Hadoop Configuration Files with their description

File	Description
hadoop-env.sh	It contains ENVIRONMENT variables used in SCRIPTS to run Hadoop.
core-site.sh	It contains configuration settings for Hadoop, such as Core I/O COMMON to HDFS and MapReduce.
hdfs-site.sh	It contains configuration settings for HDFS daemons, name NODES, secondary namenodes, and the data nodes.
mapred-site.sh	It contains configuration settings for MapReduce daemons, such as the job trackers and the task trackers.
Masters	It is a list of machines that run a secondary name node.
Slaves	It is a list of machines that run data nodes and task-trackers.