InterviewSolution
Saved Bookmarks
| 1. |
What are PySpark serializers? |
|
Answer» The serialization process is used to conduct performance tuning on Spark. The data sent or RECEIVED over the network to the disk or memory should be persisted. PySpark supports serializers for this purpose. It supports two types of serializers, they are:
Consider an example of serialization which makes use of MarshalSerializer: # --serializing.py----from pyspark.context import SparkContextfrom pyspark.serializers import MarshalSerializersc = SparkContext("LOCAL", "Marshal Serialization", serializer = MarshalSerializer()) #Initialize spark context and serializerprint(sc.parallelize(list(range(1000))).map(lambda x: 3 * x).take(5))sc.stop()When we run the file using the command: $SPARK_HOME/bin/spark-submit serializing.pyThe OUTPUT of the code WOULD be the list of size 5 of numbers multiplied by 3: [0, 3, 6, 9, 12] |
|