InterviewSolution
Saved Bookmarks
| 1. |
What is RDD? How does spark RDD works? What are the various ways to create the RDD? |
|
Answer» Resilient distributed dataset (RDD) is a core of Spark framework, which is a fault-tolerant collection of elements that can be operated on in parallel. Below are the key points on RDD:
We can create the RDD using below approach: By Referring a dataset:
By Parallelizing a dataset:
By converting dataframe to rdd.
RDDs predominately support TWO types of operations: transformations, which create a new dataset from an existing one, and actions, which return a value to the driver PROGRAM after running a computation on the dataset. |
|