1.

What do you understand by Pyspark’s startsWith() and endsWith() methods?

Answer»

These methods belong to the Column class and are used for searching DataFrame ROWS by CHECKING if the column value starts with some value or ends with some value. They are used for filtering data in applications.

  • startsWith() – returns boolean Boolean value. It is true when the value of the column starts with the specified string and False when the match is not satisfied in that column value.
  • endsWith() – returns boolean Boolean value. It is true when the value of the column ends with the specified string and False when the match is not satisfied in that column value.

Both the methods are case-sensitive.

Consider an example of the startsWith() method here. We have created a DataFrame with 3 rows:

data = [('Harry', 20), ('Ron', 20), ('Hermoine', 20)]columns = ["Name","Age"]DF = spark.createDataFrame(data=data, schema = columns)

If we have the below code that checks for returning the rows where all the names in the Name column start with “H”,

IMPORT org.apache.spark.sql.functions.coldf.filter(col("Name").startsWith("H")).show()

The output of the code would be:

+-----------+----------+| Name | Age |+-----------+----------+| Harry | 20 || Hermoine | 20 |+-----------+----------+

Notice how the record with the Name “Ron” is filtered out because it does not start with “H”.



Discussion

No Comment Found