What do you understand by Shuffling in Spark?

1.	What do you understand by Shuffling in Spark?
Answer» The process of redistribution of data across different partitions which might or might not CAUSE data movement across the JVM processes or the EXECUTORS on the separate MACHINES is known as shuffling/repartitioning. Partition is NOTHING but a smaller logical division of data. It is to be noted that SPARK has no control over what partition the data gets distributed across.

Answer»

The process of redistribution of data across different partitions which might or might not CAUSE data movement across the JVM processes or the EXECUTORS on the separate MACHINES is known as shuffling/repartitioning. Partition is NOTHING but a smaller logical division of data.

It is to be noted that SPARK has no control over what partition the data gets distributed across.

Discussion

No Comment Found

Related InterviewSolutions

What is YARN in Spark?
What do you understand by Shuffling in Spark?
What are the data formats supported by Spark?
What is the difference between repartition and coalesce?
What are receivers in Apache Spark Streaming?
List the types of Deploy Modes in Spark.
What does DAG refer to in Apache Spark?
What is RDD?
What are the features of Apache Spark?
Can you tell me what is Apache Spark about?

What do you understand by Shuffling in Spark?

Discussion

No Comment Found

Related InterviewSolutions

Reply to Comment