What is the common workflow of a spark program?

1.	What is the common workflow of a spark program?
Answer» The most common workflow followed by the spark program is: The first STEP is to create input RDDs depending on the external data. Data can be obtained from different data sources. Post RDD creation, the RDD transformation operations like FILTER() or map() are run for creating new RDDs depending on the BUSINESS logic. If any intermediate RDDs are required to be REUSED for later purposes, we can persist those RDDs. Lastly, if any action operations like first(), count() etc are present then spark launches it to initiate parallel computation.

Answer»

The most common workflow followed by the spark program is:

The first STEP is to create input RDDs depending on the external data. Data can be obtained from different data sources.
Post RDD creation, the RDD transformation operations like FILTER() or map() are run for creating new RDDs depending on the BUSINESS logic.
If any intermediate RDDs are required to be REUSED for later purposes, we can persist those RDDs.
Lastly, if any action operations like first(), count() etc are present then spark launches it to initiate parallel computation.

Discussion