1.

How to create SparkSession?

Answer»

To create SparkSession, we use the builder pattern. The SparkSession class from the pyspark.sql library has the getOrCreate() method which creates a new SparkSession if there is none or else it returns the existing SparkSession object. The following code is an example for creating SparkSession:

import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.master("local[1]") .appName('InterviewBitSparkSession') .getOrCreate()

Here,

  • master() – This is used for setting up the mode in which the application has to run - cluster mode (use the master NAME) or standalone mode. For Standalone mode, we use the local[x] value to the FUNCTION, where x represents partition count to be created in RDD, DataFrame and DATASET. The value of x is ideally the number of CPU cores available.
  • appName() - Used for setting the application name
  • getOrCreate() – For returning SparkSession object. This creates a new object if it does not exist. If an object is there, it simply returns that.

If we want to create a new SparkSession object every TIME, we can use the newSession method as SHOWN below:

import pysparkfrom pyspark.sql import SparkSessionspark_session = SparkSession.newSession


Discussion

No Comment Found