InterviewSolution
| 1. |
How many types of transformations are there in Spark and explain them? |
|
Answer» Transformations are the core of how you express your business LOGIC using Spark. There are two TYPES of transformations in Spark.:
Narrow Transformations : Transformations consist of narrow transformations are those for which each input partition will contribute to only one output partition. Some of the narrow transformations are map(), flatMap(), filter(), mapPartition(),union(). All the above transformations contribute only one partition at most one partition as shown in the above diagram. Example : val rdd1 = rdd.map( x => x+1 ) Above business logic just executes in each partition without need of other partition data. Wide Transformations : Wide transformations will have many input partitions contributes to many output partitions. We often hear this REFERRED to as a shuffle where the Spark will exchange partitions across the cluster. Some of the wide transformations are distinct(), reduceByKey(), groupByKey(), join(), repartition(), coalesce(). All these transformations contribute many input partitions to many output partitions. Example : val rdd1 = rdd.distint() Above transformation can’t produce accurate results if it executes in one partition, many partitions data is needed to get the distinct values from RDD. |
|