1.

What are the advantages and disadvantages of PySpark?

Answer»

Advantages of PySpark:

  • Simple to use: Parallelized code can be written in a SIMPLER manner.
  • Error Handling: PySpark FRAMEWORK easily handles errors.
  • Inbuilt Algorithms: PySpark provides MANY of the useful algorithms in Machine Learning or Graphs.
  • Library Support: Compared to Scala, Python has a huge library collection for working in the field of data science and data visualization.
  • EASY to Learn: PySpark is an easy to learn language.

Disadvantages of PySpark:

  • Sometimes, it becomes difficult to express problems using the MapReduce model.
  • Since Spark was originally developed in Scala, while using PySpark in Python programs they are relatively less efficient and approximately 10x times slower than the Scala programs. This WOULD impact the performance of heavy data processing applications.
  • The Spark Streaming API in PySpark is not mature when compared to Scala. It still requires improvements.
  • PySpark cannot be used for modifying the internal function of the Spark due to the abstractions provided. In such cases, Scala is preferred.


Discussion

No Comment Found