Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

1.

Write down some common differences between Hadoop and Teradata?

Answer» WRITE down some COMMON DIFFERENCES between HADOOP and TERADATA?
Below are the some of the common differences between Hadoop and Teradata:-
2.

Hadoop different Commands used in HDFS

Answer»

Hadoop different Commands used in HDFS
Below are the different commands which is used in Hadoop. Before this i have mention the different syntax used in below commands.
"< path>" This mean file or directory name.
"< path>..." This means 1 or more file or directory names.
"< file>" This means any filename.
"< src>" and "< dest>" These are path names in a directed operation.
"< localSrc>" and "< localDest>" These are paths as above PLACED on the local file system.
Now the list of commands

(1)-ls < path>:-Lists the contents of the directory specified by path and shows the names, permissions, owner, size and modification date for each entry placed in the path.

(2)-lsr < path>:-This is same as -ls but r will helps in recursively displays entries in all subdirectories of path.

(3)-du < path>:-This will shows Shows disk usage, in bytes, for all the files GIVEN in the match path. And filenames are reported with the full HDFS protocol prefix.

(4)-dus < path>:-It is same as -du but will prints summary of dis usage of all files directories in the given path.

(5)-mv < src>< dest>:-This command will moves the file or directory from defined src to dest in HDFS.

(6)-cp < src>< dest>:-This will copies the files or directory with given src to dest in HDFS.

(7)-rm < path>:-This command will Removes the file or empty directory at given path.

(8)-rmr < path>:-This will do the same as above command but will delete files rom subdirectory too.

(9)-put < localSrc> < dest>:-This will copies the files or directory from local file system which is define in localSrc to dest in HDFS.

(10)-copyFromLocal < localSrc> < dest>:-This is same as -put

(11)-moveFromLocal < localSrc> < dest>:-This will copies the files or directory from local file system which is define in localSrc to dest in HDFS in addition will deletes the local copy on success.

(12)-get [-crc] < src> < localDest>:-This will copies the file or directory in HDFS identified by src to the local file system path identified by localDest.

(13)-getmerge < src> < localDest>:-This will retrieves all files that match the path src in HDFS, and copies them to a single, merged file in the local file system identified by localDest.

(14)-cat < filen-ame>:-This will display the contents of filename on stdout.

(15)-copyToLocal < src> < localDest>:-This is same as command no 12.

(16)-moveToLocal < src> < localDest>:-This will do the same as -get but will deletes the HDFS copy on success.

(17)-mkdir < path>:-This will creates directory named path in HDFS.

(18)-setrep [-R] [-w] rep < path>:-This will sets the target replication factor for files identified by the path to rep.

(19)-touchz < path>:-This will creates a file at path containing the current time as a timestamp. Fails if a file already exists at path, unless the file is already size 0.

(20)-test -[ezd] < path>:-This will used to chec the path exists or not if returns 1 if it is exists. If this have zero length or is a directory. 0 otherwise.

(21)-stat[format]< path>:-This will prints INFORMATION about path. Format is a string which accepts file size in blocks (%b), filename (%n), block size (%o), replication (%r), and modification date (%y, %Y).

(22)-tail [-f] < file2name>:-This shows the last 1KB of file on stdout.

(23)-chmod [-R] mode,mode,... < path>...:-This will CHANGE the file permissions ASSOCIATED with one or more objects identified by path.Performs changes recursively with R. mode is a 3-digit octal mode

(24)-chown [-R] [owner][:[group]] < path>...:-Sets the owning user and/or group for files or directories identified by path.... Sets owner recursively if -R is specified.

(25)-chgrp [-R] group < path>...:-Sets the owning group for files or directories identified by path.... Sets group recursively if -R is specified.

(26)-help < cmd-name>:-This will returns usage information for one of the commands listed above. we must omit the leading '-' character in cmd.

3.

Define Port Numbers for NameNode, TaskTracker and JobTracker?

Answer»

Define Port Numbers for NAMENODE, TASKTRACKER and JOBTRACKER?
Below are the port numbers for NameNode, TaskTracker and JobTracker:-
(1)Port of NameNode is 50070
(2)Port of TaskTracker is 50060
(3)Port of JobTracker is 50030

4.

Most important InputSplits made by Hadoop Framework?

Answer»

Most important InputSplits made by HADOOP Framework?
Hadoop MAKES 5 most important inputsplits are given below:-
(1)One SPLIT for 64k FILES
(2)Two splits for 65MB files
(3)Two splits for 127MB files

5.

If we store too many small files in a cluster on HDFS what happen?

Answer»

If we store too MANY small files in a CLUSTER on HDFS what happen?
If we store too many small files in a cluster on HDFS then it will GENERATES a lot of metadata files. And the problem is that to store these metadata in the RAM will CREATE a challenge as each file, directory or block need 150 BYTES for metadata. So the size of cumulative of metadata will be too large.

6.

Seven main difference between Hadoop and Teradata?

Answer»

Seven main DIFFERENCE between Hadoop and TERADATA?
Below are the most common 7 differences between Hadoop and Teradata.

7.

Command to find the status of blocks and FileSystem health?

Answer»

Command to find the status of blocks and FILESYSTEM HEALTH?
Below are the command to get status of blocks and FileSystem health:-
(1)To CHECK the status of the blocks we use below command


HDFX Blocks
(2)And to check the health status of FileSystem we use below command


HDFS COmmand Log

8.

Name the two types of metadata that a NameNode server holds?

Answer» NAME the TWO TYPES of METADATA that a NameNode server holds?
Below are the two types of metadata that a NameNode server holds are
(1)Metadata in Disk:-Here this type of metadata CONTAINS the edit log and the FSImage.
(2)Metadata in RAM:-Here this type of metadata contains the information about DataNodes.
9.

In Hadoop why we use Context Object?

Answer»

In Hadoop why we use Context Object?
Context object helps mapper to interact with REST of Hadoop SYSTEM. And this will INCLUDES configuration data for JOB and also includes interfaces which allow it to emit output.

10.

Define different data types in Pig Latin?

Answer»

Define different data types in Pig Latin?
Pig latin can handle both ATOMIC and complex data types
(1)Atomic Data Types:-Here atomic or scalar data types are basic data types which are mainly used in all langauge like string, int, FLOAT, double, char[] and byte[] etc.
(2)Complex Data Types:-Complex data types are not common in all programming languages but example of these are Tuple, Map and BAG.

11.

Name and syntax for core methods of Reducer?

Answer»

Name and syntax for core methods of Reducer?
Below are the three core methods of reducer which is given below
(1)SETUP():-We used this METHOD for configure various parameter like INPUT data size and distributed cache. And syntax is given below.
public void setup (context)

(2)reduce():-HEART of the Reducer is always CALLED once per key with the associated reduced task.
public void reduce(Key, Value, context)

(3)cleanup():-We used this method to clean the temporary files once at the end of the task
public void cleanup (context)

12.

What do you mean by FSCK in Hadoop?

Answer»

What do you mean by FSCK in Hadoop?
Basically FSCK STANDS for File System Consistency Check it is a command that used to run a summary report in Hadoop. And this will summarizes the status of HDFS. And this will ALSO identifies the presence of ERROR. But will not correct them. And ONE of the great feature of that is we can run that on entire system or a select subset of files.

13.

Which command will help us to find the status of blocks and FileSystem health?

Answer»

Which COMMAND will help US to find the status of blocks and FileSystem health?
Below is the syntax to CHECK the status of the blocks
hdfs fsck &LT; path&GT; -files -blocks

Below is the syntax to check the health status of FileSystem
hdfs fsck / -files blocks locations > dfs-fsck.log

14.

To create Mapper and Reducer which interface need to implement in Hadoop?

Answer»

To CREATE MAPPER and REDUCER which interface need to implement in HADOOP?
Below are the two interface to create Mapper and Reducer in Hadoop
(1)org.apache.hadoop.mapreduce.Mapper
(2)org.apache.hadoop.mapreduce.Reducer

15.

In which language Hadoop is written and Harware best for Hadoop

Answer»

In which LANGUAGE Hadoop is written and Harware best for Hadoop
Hadoop is a distributed computing platform which is written in JAVA. And basically It consists of the features like Google File System and MapReduce which MAKES it powerfull. Hadoop run on a DUAL processor or dual core machines which should be 4-8 GB RAM by using ECC memory.

16.

Syntax to restart NameNode and for all other daemons in Hadoop?

Answer» SYNTAX to restart NameNode and for all other DAEMONS in HADOOP?
Below syntax is used restart NameNode and other daemons in Hadoop:-
(1)To stop the NameNode we use below CODE
./sbin /Hadoop-daemon.sh stop NameNode
To start NameNode command
./sbin/Hadoop-daemon.sh start NameNode

(2)To stop all the daemons we use below code
./sbin /stop-all.sh
To start the daemons using below code
./sbin/start-all.sh
17.

Explain different modes in which Hadoop run

Answer»

Explain different modes in which Hadoop run
Below are the three different modes in which hadoop run
(1)Standalone Mode(Local):-By Default Hadoop use this mode and it is also called Local Mode. And we can also say that it runs in a local mode i.e. on a non-distributed, single mode. We use Hadoop in this Standlone Mode for mainly Learning, testing, and debugging. Hadoop works very much Fastest in this mode among 3 modes. To perform input and output operation we use local file system. As all of us knows HDFS is one of major components for Hadoop and this mode does not support the use of HDFS. Here we don't NEED to configure the below files:-
(i)hdfs-site.xml
(ii)mapred-site.xml
(iii)core-site.xml
for hadoop environment. Or we can also say that no custom configuration is needed for configuration files in this mode. All the PROCESSES in this mode will be run on a single JVM and this mode can only be used for small development purpose.

(2)Pseudo-Distributed Mode:-Like Standalone mode this mode will also runs on a single node. And in this mode every daemon (Namenode, Datanode, Secondary Name Node, Resource Manager and Node Manager) will run on separate process on separate JVM or in short we can say each daemon runs in separate Java process. As all daemons run on a single node and there is the same node for both the Master and Slave NODES. As these run on different java processes that is why it is called a Pseudo-distributed. Namenode and Resource Manager are used as Master and Datanode and Node Manager is used as a slave. A secondary name node is also used as a Master. The purpose of the Secondary Name node is to just keep the hourly based BACKUP of the Name node. And this mode is used for development and for debugging purpose both. And HDFS is also used for managing the input and output processes. And we need to change configuration files given below :-
(i)mapred-site.xml
(ii)core-site.xml
(iii)hdfs-site.xml
for setting up the environment.

(3)Fully Distributed Mode:-In this mode all the daemons run on separate individual nodes and THUS forms a multi-node cluster. And in this node there are different nodes for master and slave nodes. This is the most important one in which multiple nodes are used few of them run the Master Daemon that are Namenode and Resource Manager and the rest of them run the Slave Daemon that are DataNode and Node Manager.Here Hadoop will run on the clusters of Machine or nodes. Here the data that is used is distributed across different nodes.

18.

Name the top Commercial Hadoop Vendors?

Answer»

Name the top Commercial Hadoop Vendors?
Below are the top 10 Commercial Hadoop Vendors given below
1) Amazon Elastic MapReduce
2) Cloudera CDH Hadoop Distribution
3) HORTONWORKS Data Platform (HDP)
4) MAPR Hadoop Distribution
5) IBM Open Platform
6) Microsoft AZURE's HDInsight -Cloud based Hadoop Distrbution
7) Pivotal Big Data Suite
8) Datameer Professional
9) Datastax ENTERPRISE Analytics
10) Dell- Cloudera APACHE Hadoop Solution.

19.

Define different Hadoop configuration files?

Answer»

Define different Hadoop configuration FILES?
Below are the different Hadoop configuration files
(1)HADOOP-ENV.sh:-It will specifies the environment variables which will effect JDK used by bin/Hadoop (Hadoop DAEMON). As we knows the Hadoop framework is written in Java and will uses JRE. We can use these affect some aspects of Hadoop daemon behavior where log files are stored. The only variable we should need to change in this file is JAVA_HOME and which specifies the path to the Java 1.5.x installation used by Hadoop.

(2)mapred-default.xml:-This is one of important configuration files which required runtime environment settings of Hadoop. And file contains site spefific settings for Hadoop Map/Reduce daemons and jobs. Here in this file we specify a framework name for MapReduce by setting the MapReduce.framework.name. And this file is empty by default. We use this file to tailor the behavior of Map/Reduce on your site.

(3)CORE-SITE.XML:-This is one of most important configuration files that required runtime environment settings for a Hadoop cluster. And this will INFORMS Hdaoop daemons where the NAMENODE runs in cluster. It also informs the Name Node as to which IP and ports it should bind.

(4)HDFS-SITE.XML:-This is also an important configuration files which required for runtime enviornment settings of Haddop.This will contains the configuration settings for NAMENode, DATANODE, SECONDARYNODE. This will also used to specifiy default block replication. Here actual number of replication can also be specified when file is created.

(5)YARN-SITE.XML:-This file will supports an extensible resource model. BY default YARN tracks CPU and memory for all nodes, application and queues but resource definition can be extended to include arbitrary countable resources. A countable resource is a resource that is consumed while a container is running, but is released afterwards. CPU and memory are both countable resources. Other examples include GPU resources and software licenses.

(6)Master and Slaves:-This will used to determine the master Nodes in Hadoop cluster. It will inform about the location of SECONDARY NAMENODE to Hadoop Daemon. The Mater File on Slave node is blank.
Slave:-It is used to determine the slave Nodes in Hadoop cluster. The Slave file at Master Node contains a list of hosts, one per line. The Slave file at Slave server contains IP address of Slave nodes.

20.

Difference between NFS and HDFS in BigData

Answer»

Difference between NFS and HDFS in BigData
Below are the four main difference between NFS and HDFS
(1)NFS:-With the help of NFS we can store and process small volumes of DATA.
(1)HDFS:-And HDFS is designed for STORING and process the Big Data.

(2)NFS:-We store data in NFS in any dedicated hardware.
(2)HDFS:-Here we will divided data into blocks and that is distributes on local drives of the hardware.

(3)NFS:-In NFS when system is failed we cannot access the data.
(3)HDFS:-But in HDFS in case of system failure we can still the ACCESSED the Data.

(4)NFS:-NFS runs on single machine and there is no chance for data redundancy.
(4)HDFS:-AS we KNOW HDFS runs on cluster of machines so replication protocol may lead to redundant data.

21.

Answer the Port Numbers for NameNode, Task Tracker and Job Tracker

Answer» ANSWER the PORT Numbers for NAMENODE, Task Tracker and Job Tracker
Below are the Port Numbers for NameNode, TaskTracker and JOBTRACKER
(1)NameNode Port 50070
(2)Task Tracker Port 50060
(3)Job Tracker Port 50030
22.

List all daemons required to run Hadoop cluster

Answer»

Below are the 4 daemons required to RUN Hadoop cluster
(1)NAMENODE
(2)DATANODE
(3)JobTracker
(4)TaskTracker

23.

Name the important components of Hadoop

Answer»

Name the important components of Hadoop
Below are the 3 main components of Hadoop
(1)Hadoop Distributed FILE System (HDFS)
(2)YET Another Resource Negotiator (YARN)
(3)Common

24.

What are the most common input formats in Hadoop

Answer»

What are the most common input formats in HADOOP
There are 3 mainly most common input formats DEFINE in hadoop
(1)TEXTINPUTFORMAT
(2)KeyValueInputFormat
(3)SequenceFileInputFormat
From above 3 inputs TextInputFormat is by default input format.

25.

Typical block size of HDFS block

Answer» TYPICAL block size of HDFS block
It is 64 MB but we can EXTEND to CUSTOM DEFINED 128 MB
26.

Do you have any idea about the Real Time Industry applications for Hadoop?

Answer»

Do you have any idea about the Real Time Industry applications for Hadoop?
As from last couple of questions we know that Hadoop is open-source software platform which is also know as Apache Hadoop. And due to its very useful features LIKE High Perofrmance,Cost Effectiveness, analytics of structure and un-structure data creation on digital PLATFORMS makes this very useful and helping hand in Real Time Industry applications. And these days this has been using in every industry and every departments of the industry. Below are the different instances where hadoop is used:-
(1)It will helps in managing traffic on streets.
(2)It will helps in streaming processing.
(3)It will helps in content mangement and E-mails archiving.
(4)With help of Hadoop cluster it can helps in processing of RAT brain neuronal signals.
(5)It will helps to manage content, posts, images and videos on social media platforms.
(6)Advertisements targeting platforms are using Hadoop to capture and analyze click stream, transaction, video, and social media data.
(7)It will helps to analyze customer data for business performance by analyze real time data.
(8)It will also helps in public sector FIELDS such as defense, cyber security, scientific research and intelligence.
(9)It will also helps to analyze output from medical devices, doctore notes, lab results, reports, medical correspondence, clinical data and in handling financial data.

27.

What are the eight main difference between Hadoop and Spark

Answer»

What are the eight main DIFFERENCE between HADOOP and SPARK
Below are the 8 main differences between Hadoop and spark

Hadoop Vs Spark

28.

From which minimum Java Version required to run Hadoop?

Answer»

From which minimum Java Version REQUIRED to run Hadoop?
Minimum version required to run Hadoop is Java 1.6.x or higher versions GOOD for Hadoop DECLARED by Sun. Both Linux and Windows operating system are Hadoop oriented. But BSD, Mac OS/X and SOLARIS are more famous for working.

29.

What happens when two clients access the same file in the HDFS?

Answer»

What happens when two clients access the same file in the HDFS?
When first CLIENT connect the NAMENODE to open file for writing then NameNode GRANTS a lease to the client to update or create the file. And when same file ask by SECOND client tries to open for writing then NameNode will NOTICE that the lease for file is already granted to another client so it will reject the open request for second client.

30.

Which of the following writable can be used to know the value from a mapper/reducer?

Answer»

Which of the following writable can be used to know the VALUE from a mapper reducer?
CHOOSE the CORRECT answer from below list
(1)Text
(2)IntWritable
(3)Nullwritable
(4)String

Answer:-(3)Nullwritable

31.

Name the most famous companies that use Hadoop

Answer»

Name the most famous companies that use Hadoop
Below are the most famous campanies that use hadoop
(1)Facebook
(2)Twitter
(3)Amazon
(4)NETFLIX
(5)Adobe
(6)eBay
(7)Hulu
(8)Spotify
(9)Rubikloud

32.

Who will initiate the mapper?

Answer»

Who will initiate the mapper?
Choose the CORRECT answer from below list
(1)SQOOP
(2)Task tracker
(3)Mapper
(4)Reducer

Answer:-(2)Task tracker

33.

Different commands for starting and shutting down Hadoop Daemons.

Answer»

Different COMMANDS for starting and shutting down HADOOP DAEMONS.
One of the most asked interview questions in Hadoop and commands is given below:-
(1)Below command will START all the daemons:
./sbin/start-all.sh

(2)Below command will shut down all the daemons:
./sbin/stop-all.sh

34.

Which of the following are true for Hadoop Pseudo Distributed Mode?

Answer»

Which of the following are true for Hadoop PSEUDO DISTRIBUTED Mode?
Choose the correct ANSWER from below list
(1)It runs on multiple MACHINES
(2)Runs on multiple machines without any daemons
(3)Runs on Single Machine with all daemons
(4)Runs on Single Machine without all daemons

Answer:-(3)Runs on Single Machine with all daemons

35.

HDFS fault tolerant information?

Answer»

HDFS fault TOLERANT information?
Fault Tolerance in Hadoop HDFS refers to the behaviour of system in un-situable condition and how that system handle such situation. HDFS is fault-tolerant because it will REPLICATES DATA on different DATANODES. ANd by default all block of data is replicated on three datanodes. And this data blocks are stored in different DataNodes.
And in case of one node crashes we can still retrieved from the other datanodes where block of data is replicates

36.

What is the use of JPS command in Hadoop?

Answer»

What is the use of JPS command in HADOOP?
To test the working of all hadoop daemons we USED the JPS command. JPS will specifically tests daemons LIKE NODEMANAGER, DataNode, RESOURCEMANAGER, NameNode and many more.