InterviewSolution
Saved Bookmarks
| 1. |
What are the ways to create RDD? |
|
Answer» SPARK PROVIDES 3 ways to create RDD:
By parallelizing a local Collection : We can create RDD from collection. A Collection can be Array, List or Sequence. val spark = SparkSession.builder().getOrCreate() val sc = spark.sparkContext val collection = Array(1,2,4,6,9) val rdd = sc.parallelize(collection)From Text files : We can also create RDD from text FILE or csv file. val spark = SparkSession.builder().getOrCreate() val sc = spark.sparkContext val rdd = sc.textFile(“path/to/textfile”)From existing DataFrames or Datasets: val spark = SparkSession.builder().getOrCreate() val df = spark.range(10) val rdd = df.rdd |
|