1.

Explain the key features of Apache Spark.

Answer»

Apache Spark has the following key FEATURES:

  1. Polyglot.
  2. Performance.
  3. Data sources.
  4. Lazy Evaluation
  5. Real-time computation.
  6. Hadoop Integration
  7. Machine Learning

Polyglot

Spark code can be written in Java, Scala, Python or R.  It also provides interactive modes in Scala and Python.

Performance:

Apache Spark is unto 100 times faster than MapReduce.

Data Formats:

Spark supports multiple data sources such as Parquet, CSV, JSON, HIVE, Cassandra and HBase.

Lazy Evaluation :

Spark delays its execution until it is necessary. For transformations, Spark adds them to DAG and executes when action performed.

Real-time computation :

Spark computation at real-time has less latency because of its in-memory computation and maximum use of the cluster.

Hadoop Integration :

Spark provides good compatibility with Hadoop. Spark is a potential replacement of MapReduce functions of Hadoop as Spark can run on top of an existing Hadoop cluster using YARN.

Machine Learning:

As Spark has MANY in-built libraries along with Mlib library, Spark provides Data ENGINEERS and Data SCIENTIST with as powerful unified engine that is fast and easy to use.



Discussion

No Comment Found