1.

What do you understand by Shuffling in Spark?

Answer»

The process of redistribution of data across different partitions which might or might not CAUSE data movement across the JVM processes or the EXECUTORS on the separate MACHINES is known as shuffling/repartitioning. Partition is NOTHING but a smaller logical division of data.

It is to be noted that SPARK has no control over what partition the data gets distributed across.



Discussion

No Comment Found