How can we create DataFrames in PySpark?

1.	How can we create DataFrames in PySpark?
Answer» We can do it by MAKING USE of the createDataFrame() method of the SparkSession. data = [('Harry', 20), ('Ron', 20), ('Hermoine', 20)]COLUMNS = ["Name","Age"]df = spark.createDataFrame(data=data, schema = columns) This creates the dataframe as shown below: +-----------+----------+\| Name \| Age \|+-----------+----------+\| Harry \| 20 \|\| Ron \| 20 \|\| Hermoine \| 20 \|+-----------+----------+ We can GET the schema of the dataframe by using df.printSchema() >> df.printSchema()root\|-- Name: string (nullable = true)\|-- Age: integer (nullable = true)

Answer»

We can do it by MAKING USE of the createDataFrame() method of the SparkSession.

data = [('Harry', 20), ('Ron', 20), ('Hermoine', 20)]COLUMNS = ["Name","Age"]df = spark.createDataFrame(data=data, schema = columns)

This creates the dataframe as shown below:

+-----------+----------+| Name | Age |+-----------+----------+| Harry | 20 || Ron | 20 || Hermoine | 20 |+-----------+----------+

We can GET the schema of the dataframe by using df.printSchema()

>> df.printSchema()root|-- Name: string (nullable = true)|-- Age: integer (nullable = true)

Discussion