|
Answer» The most common workflow followed by the spark program is: - The first STEP is to create input RDDs depending on the external data. Data can be obtained from different data sources.
- Post RDD creation, the RDD transformation operations like FILTER() or map() are run for creating new RDDs depending on the BUSINESS logic.
- If any intermediate RDDs are required to be REUSED for later purposes, we can persist those RDDs.
- Lastly, if any action operations like first(), count() etc are present then spark launches it to initiate parallel computation.
|