1.

What is the difference between coalesce and repartition in Spark?

Answer»
CoalesceRepartition
It is used for definitely decreasing the number of partitions used in a Dataframe.This method can decrease or increase the number of partitions used in a Dataframe.
It USES the existing partitions to minimize the AMOUNT of data being shuffled in a Dataframe.It just creates NEW partitions and while doing a full shuffle.
The partitions through this method are of variable sizes.The partitions in this method are ROUGHLY the same sizes.


Discussion

No Comment Found