1.

What is the other notable feature of RDD and ways to create the RDD?

Answer»
  • In-MEMORY:  ABILITY to perform operation in the primary memory not in the disk
  • Immutable or Read-Only: Emphasize in creating the immutable data set.
  • Lazy evaluated: Spark computing the record when the action is going to perform, not in transformation level.
  • Cacheable: We can cache the record, for faster processing.
  • Parallel:  Spark has an ability to parallelize the operation on data, saved in     RDD.
  • Partitioned of records: Spark has ability to partition the record, by DEFAULT its support 128 MB of partition.
  • Parallelizing: an existing collection in your driver program. 
  • Referencing a dataset in an external STORAGE system, such as a SHARED file system, HDFS, HBase


Discussion

No Comment Found