InterviewSolution
| 1. |
How RDD can be created in Spark? |
|
Answer» RDDs or Resilient Distributed Datasets are the fundamental data structure present in Spark. They are immutable and fault-tolerant in nature. There are multiple ways to create RDDs in Spark. They are:
RDDs can be created by taking an existing collection from a driver’s program and passing it to the Spark’s SparkContext’s parallelize () method. Here’s an example:
Mostly, in production systems, USERS can generate RDDs from files by simply reading the data from the files. Let us see how:
You can easily convert any DataFrame or DataSet into an RDD. It can be done by using the rdd() method. Here’s how: |
|