1.

What are the different cluster manager types supported by PySpark?

Answer»

A cluster manager is a cluster mode platform that helps to run Spark by providing all resources to WORKER nodes BASED on the REQUIREMENTS.

The above figure shows the position of cluster manager in the Spark ECOSYSTEM. Consider a master node and multiple worker nodes present in the cluster. The master nodes provide the worker nodes with the resources like memory, processor allocation etc depending on the nodes requirements with the help of the cluster manager.

PySpark supports the following cluster manager types:

  • Standalone – This is a SIMPLE cluster manager that is included with Spark.
  • Apache Mesos – This manager can run Hadoop MapReduce and PySpark apps.
  • Hadoop YARN – This manager is used in Hadoop2.
  • Kubernetes – This is an open-source cluster manager that helps in automated deployment, scaling and automatic management of containerized apps.
  • local – This is simply a mode for running Spark applications on laptops/desktops.


Discussion

No Comment Found