InterviewSolution
Saved Bookmarks
| 1. |
How will you create PySpark UDF? |
|
Answer» Consider an example where we want to capitalize the first letter of every word in a string. This feature is not supported in PySpark. We can however achieve this by creating a UDF capitalizeWord(str) and using it on the DataFrames. The following STEPS demonstrate this:
To capitalize every first character of the word, we can use: df.select(col("ID_COLUMN"), convertUDF(col("NAME_COLUMN")) .alias("NAME_COLUMN") ) .show(truncate=False)The output of the above code would be: +----------+-----------------+|ID_COLUMN |NAME_COLUMN |+----------+-----------------+|1 |Harry Potter ||2 |Ronald Weasley ||3 |Hermoine Granger |+----------+-----------------+UDFs have to be designed in a way that the algorithms are efficient and take less time and space complexity. If care is not taken, the performance of the DataFrame OPERATIONS would be impacted. |
|