InterviewSolution
Saved Bookmarks
| 1. |
What do you understand by Pyspark’s startsWith() and endsWith() methods? |
|
Answer» These methods belong to the Column class and are used for searching DataFrame ROWS by CHECKING if the column value starts with some value or ends with some value. They are used for filtering data in applications.
Both the methods are case-sensitive. Consider an example of the startsWith() method here. We have created a DataFrame with 3 rows: data = [('Harry', 20), ('Ron', 20), ('Hermoine', 20)]columns = ["Name","Age"]DF = spark.createDataFrame(data=data, schema = columns)If we have the below code that checks for returning the rows where all the names in the Name column start with “H”, IMPORT org.apache.spark.sql.functions.coldf.filter(col("Name").startsWith("H")).show()The output of the code would be: +-----------+----------+| Name | Age |+-----------+----------+| Harry | 20 || Hermoine | 20 |+-----------+----------+Notice how the record with the Name “Ron” is filtered out because it does not start with “H”. |
|