1.

What is PySpark?

Answer»

PySpark is an Apache Spark interface in Python. It is used for collaborating with Spark using APIs written in Python. It also supports Spark’s FEATURES like Spark DataFrame, Spark SQL, Spark Streaming, Spark MLlib and Spark Core. It provides an interactive PySpark shell to analyze STRUCTURED and semi-structured data in a distributed environment. PySpark supports reading data from multiple sources and different formats. It also facilitates the use of RDDs (RESILIENT Distributed DATASETS). PySpark features are implemented in the py4j library in python.

PySpark can be installed using PyPi by using the COMMAND:

pip install pyspark


Discussion

No Comment Found