1.

How can we create DataFrames in PySpark?

Answer»

We can do it by MAKING USE of the createDataFrame() method of the SparkSession.

data = [('Harry', 20), ('Ron', 20), ('Hermoine', 20)]COLUMNS = ["Name","Age"]df = spark.createDataFrame(data=data, schema = columns)

This creates the dataframe as shown below:

+-----------+----------+| Name | Age |+-----------+----------+| Harry | 20 || Ron | 20 || Hermoine | 20 |+-----------+----------+

We can GET the schema of the dataframe by using df.printSchema()

>> df.printSchema()root|-- Name: string (nullable = true)|-- Age: integer (nullable = true)


Discussion

No Comment Found