1.

What are the ways to create RDD?

Answer»

SPARK PROVIDES 3 ways to create RDD:

  1. By parallelizing a local collection
  2. From data SOURCES or text FILES
  3. From existing Dataframes or Datasets.

By parallelizing a local Collection :

We can create RDD from collection. A Collection can be Array, List or Sequence.

val spark = SparkSession.builder().getOrCreate() val sc = spark.sparkContext val collection = Array(1,2,4,6,9) val rdd = sc.parallelize(collection)

From Text files :

We can also create RDD from text FILE or csv file. val spark = SparkSession.builder().getOrCreate() val sc = spark.sparkContext val rdd = sc.textFile(“path/to/textfile”)

From existing DataFrames or Datasets:

val spark = SparkSession.builder().getOrCreate() val df = spark.range(10) val rdd = df.rdd


Discussion

No Comment Found