1.

What is the common workflow of a spark program?

Answer»

The most common workflow followed by the spark program is:

  • The first STEP is to create input RDDs depending on the external data. Data can be obtained from different data sources.
  • Post RDD creation, the RDD transformation operations like FILTER() or map() are run for creating new RDDs depending on the BUSINESS logic.
  • If any intermediate RDDs are required to be REUSED for later purposes, we can persist those RDDs.
  • Lastly, if any action operations like first(), count() etc are present then spark launches it to initiate parallel computation.


Discussion

No Comment Found