1.

How Can You Minimize Data Transfers When Working With Spark?

Answer»

MINIMIZING data TRANSFERS and avoiding shuffling helps WRITE spark programs that run in a fast and reliable manner.

The VARIOUS ways in which data transfers can be minimized when working with Apache Spark are:

  1. Using Broadcast Variable- Broadcast variable enhances the efficiency of joins between small and LARGE RDDs.
  2. Using Accumulators – Accumulators help update the values of variables in parallel while executing.
  3. The most common way is to avoid operations ByKey, repartition or any other operations which trigger shuffles.

Minimizing data transfers and avoiding shuffling helps write spark programs that run in a fast and reliable manner.

The various ways in which data transfers can be minimized when working with Apache Spark are:



Discussion

No Comment Found