InterviewSolution
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 51. |
What is an executor in spark and how its support to perform the operation on volume of data? |
|
Answer» Executors are WORKER nodes' processes in charge of running individual tasks when Spark job get submitted. They are launched at the beginning of a Spark application and typically run for the entire LIFETIME of an application. Once they have run the task, they send the results to the DRIVER. They also provide in-memory storage for RDDs that are cached by USER programs through Block Manager. Below are the key points on executors:
Executor also work as a distributed agent responsible for the execution of tasks. When the job getting launched, spark trigger the executor, which act as a worker node which responsible for running individual task, which is assigned by spark driver. |
|
| 52. |
Why we need the master driver in spark? |
|
Answer» Master driver is central point and the entry point of the Spark Shell which is supporting this LANGUAGE (Scala, Python, and R). Below is the sequential process, which driver follows to execute the spark job.
The complete process can track by cluster manager user interface. Driver exposes the information about the running spark application through a WEB UI at port 4040 |
|
| 53. |
What are the key component of spark which internally spark require to execute the job? |
Answer»
|
|
| 54. |
What Spark-SQL does, how it’s benefits to programmer to interact with database? And Syntax of creating SQL Context? |
|
Answer» Spark SQL provides programmatic abstraction in the form of data frame and data set which can work the principal of distributed SQL query engine. Spark SQL simplify the INTERACTION to the large amount of data through the dataframe and dataset.
Spark SQL plays a vital role on optimization technique using Catalyst optimizer, Spark SQL also support UDF, built in function and aggregates function. |
|
| 55. |
What are the benefits of using Spark streaming for real time processing instead of other framework and tools? |
|
Answer» Spark Streaming supports MICRO-batch-oriented stream processing engine, Spark has a capability to allow the data can be ingested from many sources like Kafka, Flume, Kinesis, or TCP SOCKETS, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Below are the other key benefits which Spark streaming support.
Spark work on batches which RECEIVES an input data stream and divided into the micro batches, which is further processed by the spark engine to generate the final stream of result in the batches. |
|
| 56. |
How spark core fit into the picture to solving the big data use case? |
|
Answer» Reduce, collection, aggregation API, stream, parallel stream, optional which can easily HANDLE to all the use CASE where we are dealing volume of data handling. Bullet points are as follows:
APPARENTLY spark use for data processing framework, however we can also use to perform the data analysis and data science. |
|
| 57. |
What do the features of Spark provide, which is not available to the Map-Reduce? |
|
Answer» Spark API provides various KEY features, which is very useful for spark real time processing, most of the features has a well support library ALONG with real time processing capability. Below are the key features providing by spark framework:
Spark core is a heart of spark framework and well support capability for functional programing practice for the language like Java, Scala, Python, however most of the new release come for JVM language first and then LATER on introduced for python. |
|