PySpark - Serializers

MarshalSerializer

Serializes objects using Python’s Marshal Serializer. This serializer is faster than PickleSerializer, but supports fewer datatypes.

class pyspark.MarshalSerializer

PickleSerializer

Serializes objects using Python’s Pickle Serializer. This serializer supports nearly any Python object, but may not be as fast as more specialized serializers.

class pyspark.PickleSerializer

Let us see an example on PySpark serialization. Here, we serialize the data using MarshalSerializer.

--------------------------------------serializing.py-------------------------------------
from pyspark.context import SparkContext
from pyspark.serializers import MarshalSerializer
sc = SparkContext("local", "serialization app", serializer = MarshalSerializer())
print(sc.parallelize(list(range(1000))).map(lambda x: 2 * x).take(10))
sc.stop()
--------------------------------------serializing.py-------------------------------------

Command − The command is as follows −

$SPARK_HOME/bin/spark-submit serializing.py

Output − The output of the above command is −

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

Previous Page Print Page