50 + Interview Questions in Hadoop Interview Questions in Company QA Page 1 InterviewSolution

1.	Why is Checkpointing Important in Hadoop?
Answer»

Discussion

2.	Can Hadoop handle streaming data?
Answer»

Discussion

3.	What is speculative execution in Hadoop?
Answer»

Discussion

4.	What are Problems with small files and HDFS?
Answer»

Discussion

5.	Suppose Hadoop spawned 100 tasks for a job and one of the task failed. What will Hadoop do?
Answer»

Discussion

6.	Consider case scenario: In M/R system, - HDFS block size is 64 MB
Answer»

Discussion

7.	What is a Combiner?
Answer»

Discussion

8.	What does "file could only be replicated to 0 nodes, instead of 1" mean?
Answer»

Discussion

9.	What happens when two clients try to write into the same HDFS file?
Answer»

Discussion

10.	Can we search for files using wildcards?
Answer»

Discussion

11.	How to make a large cluster smaller by taking out some of the nodes?
Answer»

Discussion

12.	What happens if one Hadoop client renames a file or a directory containing this file while another client is still writing into it?
Answer»

Discussion

13.	Does the name-node stay in safe mode till all under-replicated files are fully replicated?
Answer»

Discussion

14.	How do you gracefully stop a running job?
Answer»

Discussion

15.	The requirement is to add a new data node to a running Hadoop cluster; how do I start services on just one data node?
Answer»

Discussion

16.	Is there a hdfs command to see available free space in hdfs
Answer»

Discussion

17.	Which file does the Hadoop-core configuration?
Answer»

Discussion

18.	What is rack awareness?
Answer»

Discussion

19.	Default replication factor to a file is 3.
Answer»

Discussion

20.	Copy a directory from one node in the cluster to another
Answer»

Discussion

21.	Why ‘Reading‘ is done in parallel and ‘Writing‘ is not in HDFS?
Answer»

Discussion

22.	Explain how do ‘map’ and ‘reduce’ works.
Answer»

Discussion

23.	What is a Secondary Namenode? Is it a substitute to the Namenode?
Answer»

Discussion

24.	What is a rack?
Answer»

Discussion

25.	What is the communication channel between client and namenode/datanode?
Answer»

Discussion

26.	Are job tracker and task trackers present in separate machines?
Answer»

Discussion

27.	How indexing is done in HDFS?
Answer»

Discussion

28.	What are the benefits of block transfer?
Answer»

Discussion

29.	What is a ‘block’ in HDFS?
Answer»

Discussion

30.	What is a heartbeat in HDFS?
Answer»

Discussion

31.	What is a task tracker?
Answer»

Discussion

32.	What is a job tracker?
Answer»

Discussion

33.	Why do we use HDFS for applications having large data sets and not when there are lot of small files?
Answer»

Discussion

34.	What is a Datanode?
Answer»

Discussion

35.	Is Namenode also a commodity hardware?
Answer»

Discussion

36.	What is a Namenode?
Answer»

Discussion

37.	Since the data is replicated thrice in HDFS, does it mean that any calculation done on one node will also be replicated on the other two?
Answer»

Discussion

38.	Replication causes data redundancy, then why is it pursued in HDFS?
Answer»

Discussion

39.	What is Fault Tolerance?
Answer»

Discussion

40.	What is the basic difference between traditional RDBMS and Hadoop?
Answer»

Discussion

41.	Why do we need Hadoop?
Answer»

Discussion

42.	How is analysis of Big Data useful for organizations?
Answer»

Discussion

43.	What are the four characteristics of Big Data?
Answer»

Discussion

44.	What is Big Data?
Answer»

Discussion

45.	What happens to job tracker when Namenode is down?
Answer»

Discussion

46.	What if a Namenode has no data?
Answer»

Discussion

47.	What does /etc /init.d do?
Answer»

Discussion

48.	Which are the three modes in which Hadoop can be run?
Answer»

Discussion

49.	How to restart Namenode?
Answer»

Discussion

50.	What does ‘jps’ command do?
Answer»

Discussion

Explore topic-wise InterviewSolutions in .

Why is Checkpointing Important in Hadoop?

Can Hadoop handle streaming data?

What is speculative execution in Hadoop?

What are Problems with small files and HDFS?

Suppose Hadoop spawned 100 tasks for a job and one of the task failed. What will Hadoop do?

Consider case scenario: In M/R system, - HDFS block size is 64 MB

What is a Combiner?

What does "file could only be replicated to 0 nodes, instead of 1" mean?

What happens when two clients try to write into the same HDFS file?

Can we search for files using wildcards?

How to make a large cluster smaller by taking out some of the nodes?

What happens if one Hadoop client renames a file or a directory containing this file while another client is still writing into it?

Does the name-node stay in safe mode till all under-replicated files are fully replicated?

How do you gracefully stop a running job?

The requirement is to add a new data node to a running Hadoop cluster; how do I start services on just one data node?

Is there a hdfs command to see available free space in hdfs

Which file does the Hadoop-core configuration?

What is rack awareness?

Default replication factor to a file is 3.

Copy a directory from one node in the cluster to another

Why ‘Reading‘ is done in parallel and ‘Writing‘ is not in HDFS?

Explain how do ‘map’ and ‘reduce’ works.

What is a Secondary Namenode? Is it a substitute to the Namenode?

What is a rack?

What is the communication channel between client and namenode/datanode?

Are job tracker and task trackers present in separate machines?

How indexing is done in HDFS?

What are the benefits of block transfer?

What is a ‘block’ in HDFS?

What is a heartbeat in HDFS?

What is a task tracker?

What is a job tracker?

Why do we use HDFS for applications having large data sets and not when there are lot of small files?

What is a Datanode?

Is Namenode also a commodity hardware?

What is a Namenode?

Since the data is replicated thrice in HDFS, does it mean that any calculation done on one node will also be replicated on the other two?

Replication causes data redundancy, then why is it pursued in HDFS?

What is Fault Tolerance?

What is the basic difference between traditional RDBMS and Hadoop?

Why do we need Hadoop?

How is analysis of Big Data useful for organizations?

What are the four characteristics of Big Data?

What is Big Data?

What happens to job tracker when Namenode is down?

What if a Namenode has no data?

What does /etc /init.d do?

Which are the three modes in which Hadoop can be run?

How to restart Namenode?

What does ‘jps’ command do?