1.

What are the performance techniques in Spark?

Answer»

There are MANY optimization techniques that we can perform to help SPARK job run faster.

We will list some of them. Please note that these can be applied when your code requires improvement in performance based on the functionality which you are implementing.

  • By making good design – It helps you to write better Spark Application and helps to run more stable and consistent manner over time.
  • By Using kryo object Serialization
  • Dynamic Allocation of cluster resources
  • By choosing splittable FILE types and compressed.
  • By increasing the parallelism.
  • Bucketing – Bucketing your data allows Spark to pre-partition data
  • By tuning garbage collection.
  • By configuring Spark’s external SHUFFLE service.
  • By using filters
  • By using REPARTITION and coalesce
  • By using a minimal number of UDF’s
  • By caching/persisting
  • By Using Shared variables(Broadcasting variables and Accumulators).


Discussion

No Comment Found