1.

What is RDD?

Answer»

RDD stands for Resilient Distribution DATASETS. It is a fault-tolerant collection of parallel running operational elements. The partitioned data of RDD is distributed and IMMUTABLE. There are two types of datasets:

  • Parallelized COLLECTIONS: Meant for running parallelly.
  • Hadoop datasets: These PERFORM operations on FILE record systems on HDFS or other storage systems.


Discussion

No Comment Found