1.

How can the data transfers be minimized while working with Spark?

Answer»

Data transfers correspond to the process of shuffling. Minimizing these transfers results in faster and RELIABLE RUNNING Spark applications. There are various ways in which these can be MINIMIZED. They are:

  • Usage of Broadcast Variables: Broadcast variables increases the efficiency of the join between LARGE and small RDDs.
  • Usage of Accumulators: These help to update the variable values parallelly during execution.
  • Another common WAY is to avoid the operations which trigger these reshuffles.


Discussion

No Comment Found