| 1. |
What Is Rdd? |
|
Answer» RDDs (Resilient Distributed Datasets) are basic abstraction in Apache Spark that represent the data coming into the system in object format. RDDs are used for in-memory computations on large clusters, in a fault tolerant MANNER. RDDs are read-only PORTIONED, collection of records, that are – IMMUTABLE – RDDs cannot be ALTERED. Resilient – If a NODE holding the partition fails the other node takes the data. RDDs (Resilient Distributed Datasets) are basic abstraction in Apache Spark that represent the data coming into the system in object format. RDDs are used for in-memory computations on large clusters, in a fault tolerant manner. RDDs are read-only portioned, collection of records, that are – Immutable – RDDs cannot be altered. Resilient – If a node holding the partition fails the other node takes the data. |
|