49 + Interview Questions in Sqoop Interview Questions in Company QA Page 1 InterviewSolution

1.	Where can the metastore database be hosted?
Answer»

Discussion

2.	Which database the sqoop metastore runs on?
Answer»

Discussion

3.	Give the sqoop command to see the content of the job named myjob?
Answer»

Discussion

4.	How can you see the list of stored jobs in sqoop metastore?
Answer»

Discussion

5.	What is the purpose of sqoop-merge?
Answer»

Discussion

6.	What is a sqoop metastore?
Answer»

Discussion

7.	Give a command to execute a stored procedure named proc1 which exports data to from MySQL db named DB1 into a HDFS directory named Dir1.
Answer»

Discussion

8.	Give a sqoop command to import data from all tables in the MySql DB DB1.
Answer»

Discussion

9.	Give a Sqoop command to import all the records from employee table divided into groups of records by the values in the column department_id.
Answer»

Discussion

10.	What does the following query do?
Answer»

Discussion

11.	Give a sqoop command to run only 8 mapreduce tasks in parallel
Answer»

Discussion

12.	Give a sqoop command to import the columns employee_id,first_name,last_name from the MySql table Employee
Answer»

Discussion

13.	What are the two file formats supported by sqoop for import?
Answer»

Discussion

14.	How to import only the updated rows form a table into HDFS using sqoop assuming the source has last update timestamp details for each row?
Answer»

Discussion

15.	How can you control the mapping between SQL data types and Java types?
Answer»

Discussion

16.	What happens when a table is imported into a HDFS directory which already exists using the –apend parameter?
Answer»

Discussion

17.	What does this sqoop command achieve?
Answer»

Discussion

18.	What is the importance of --split-by clause in running parallel import tasks in sqoop?
Answer»

Discussion

19.	In a sqoop import command you have mentioned to run 8 parallel Mapreduce task but sqoop runs only 4. What can be the reason?
Answer»

Discussion

20.	How can you force sqoop to execute a free form Sql query only once and import the rows serially.
Answer»

Discussion

21.	What do you mean by Free Form Import in Sqoop?
Answer»

Discussion

22.	Give a sqoop command to show all the databases in a MySql server.
Answer»

Discussion

23.	Sqoop imported a table successfully to HBase but it is found that the number of rows is fewer than expected. What can be the cause?
Answer»

Discussion

24.	How can you schedule a sqoop job using Oozie?
Answer»

Discussion

25.	How can we load to a column in a relational table which is not null but the incoming value from HDFS has a null value?
Answer»

Discussion

26.	How can you export only a subset of columns to a relational table using sqoop?
Answer»

Discussion

27.	How can you sync a exported table with HDFS data in which some rows are deleted?
Answer»

Discussion

28.	How will you update the rows that are already exported?
Answer»

Discussion

29.	How do you clear the data in a staging table before loading it by Sqoop?
Answer»

Discussion

30.	How will you implement all-or-nothing load using sqoop?
Answer»

Discussion

31.	What is the difference between the parameters sqoop.export.records.per.statement and sqoop.export.statements.per.transaction
Answer»

Discussion

32.	Before starting the data transfer using mapreduce job, sqoop takes a long time to retrieve the minimum and maximum values of columns mentioned in –split-by parameter. How can we make it efficient?
Answer»

Discussion

33.	How can you choose a name for the mapreduce job which is created on submitting a free-form query import?
Answer»

Discussion

34.	How can we slice the data to be imported to multiple parallel tasks?
Answer»

Discussion

35.	How do you fetch data which is the result of join between two tables?
Answer»

Discussion

36.	Is it possible to add a parameter while running a saved job?
Answer»

Discussion

37.	What is the usefulness of the options file in sqoop.
Answer»

Discussion

38.	When the source data keeps getting updated frequently, what is the approach to keep it in sync with the data in HDFS imported by sqoop?
Answer»

Discussion

39.	How can you avoid importing tables one-by-one when importing a large number of tables from a database?
Answer»

Discussion

40.	How can you control the number of mappers used by the sqoop command?
Answer»

Discussion

41.	What is a disadvantage of using --direct parameter for faster data load by sqoop?
Answer»

Discussion

42.	What is the significance of using --compress-codec parameter?
Answer»

Discussion

43.	What is the default extension of the files produced from a sqoop import using the --compress parameter?
Answer»

Discussion

44.	What is the advantage of using --password-file rather than -P option while preventing the display of password in the sqoop import statement?
Answer»

Discussion

45.	How can we import a subset of rows from a table without using the where clause?
Answer»

Discussion

46.	How can you import only a subset of rows form a table?
Answer»

Discussion

47.	When to use --target-dir and when to use --warehouse-dir while importing data?
Answer»

Discussion

48.	Is JDBC driver enough to connect sqoop to the databases?
Answer»

Discussion

49.	What is the role of JDBC driver in a Sqoop set up?
Answer»

Discussion

Explore topic-wise InterviewSolutions in .

Where can the metastore database be hosted?

Which database the sqoop metastore runs on?

Give the sqoop command to see the content of the job named myjob?

How can you see the list of stored jobs in sqoop metastore?

What is the purpose of sqoop-merge?

What is a sqoop metastore?

Give a command to execute a stored procedure named proc1 which exports data to from MySQL db named DB1 into a HDFS directory named Dir1.

Give a sqoop command to import data from all tables in the MySql DB DB1.

Give a Sqoop command to import all the records from employee table divided into groups of records by the values in the column department_id.

What does the following query do?

Give a sqoop command to run only 8 mapreduce tasks in parallel

Give a sqoop command to import the columns employee_id,first_name,last_name from the MySql table Employee

What are the two file formats supported by sqoop for import?

How to import only the updated rows form a table into HDFS using sqoop assuming the source has last update timestamp details for each row?

How can you control the mapping between SQL data types and Java types?

What happens when a table is imported into a HDFS directory which already exists using the –apend parameter?

What does this sqoop command achieve?

What is the importance of --split-by clause in running parallel import tasks in sqoop?

In a sqoop import command you have mentioned to run 8 parallel Mapreduce task but sqoop runs only 4. What can be the reason?

How can you force sqoop to execute a free form Sql query only once and import the rows serially.

What do you mean by Free Form Import in Sqoop?

Give a sqoop command to show all the databases in a MySql server.

Sqoop imported a table successfully to HBase but it is found that the number of rows is fewer than expected. What can be the cause?

How can you schedule a sqoop job using Oozie?

How can we load to a column in a relational table which is not null but the incoming value from HDFS has a null value?

How can you export only a subset of columns to a relational table using sqoop?

How can you sync a exported table with HDFS data in which some rows are deleted?

How will you update the rows that are already exported?

How do you clear the data in a staging table before loading it by Sqoop?

How will you implement all-or-nothing load using sqoop?

What is the difference between the parameters sqoop.export.records.per.statement and sqoop.export.statements.per.transaction

Before starting the data transfer using mapreduce job, sqoop takes a long time to retrieve the minimum and maximum values of columns mentioned in –split-by parameter. How can we make it efficient?

How can you choose a name for the mapreduce job which is created on submitting a free-form query import?

How can we slice the data to be imported to multiple parallel tasks?

How do you fetch data which is the result of join between two tables?

Is it possible to add a parameter while running a saved job?

What is the usefulness of the options file in sqoop.

When the source data keeps getting updated frequently, what is the approach to keep it in sync with the data in HDFS imported by sqoop?

How can you avoid importing tables one-by-one when importing a large number of tables from a database?

How can you control the number of mappers used by the sqoop command?

What is a disadvantage of using --direct parameter for faster data load by sqoop?

What is the significance of using --compress-codec parameter?

What is the default extension of the files produced from a sqoop import using the --compress parameter?

What is the advantage of using --password-file rather than -P option while preventing the display of password in the sqoop import statement?

How can we import a subset of rows from a table without using the where clause?

How can you import only a subset of rows form a table?

When to use --target-dir and when to use --warehouse-dir while importing data?

Is JDBC driver enough to connect sqoop to the databases?

What is the role of JDBC driver in a Sqoop set up?