InterviewSolution
Saved Bookmarks
| 1. |
How can we create DataFrames in PySpark? |
|
Answer» We can do it by MAKING USE of the createDataFrame() method of the SparkSession. data = [('Harry', 20), ('Ron', 20), ('Hermoine', 20)]COLUMNS = ["Name","Age"]df = spark.createDataFrame(data=data, schema = columns)This creates the dataframe as shown below: +-----------+----------+| Name | Age |+-----------+----------+| Harry | 20 || Ron | 20 || Hermoine | 20 |+-----------+----------+We can GET the schema of the dataframe by using df.printSchema() >> df.printSchema()root|-- Name: string (nullable = true)|-- Age: integer (nullable = true) |
|