1.

What are the features of Apache Spark?

Answer»
  • High Processing Speed: Apache Spark helps in the achievement of a very high processing speed of data by reducing read-write operations to disk. The speed is almost 100x faster while performing in-memory computation and 10x faster while performing disk computation.
  • Dynamic Nature: Spark provides 80 high-level operators which help in the easy development of parallel applications.
  • In-Memory Computation: The in-memory computation feature of Spark due to its DAG execution engine increases the speed of data processing. This also SUPPORTS data caching and reduces the time required to fetch data from the disk.
  • Reusability: Spark codes can be reused for batch-processing, data streaming, running ad-hoc queries, etc.
  • Fault Tolerance: Spark supports fault tolerance using RDD. Spark RDDs are the abstractions designed to handle failures of worker nodes which ensures zero data loss.
  • Stream Processing: Spark supports stream processing in real-time. The problem in the earlier MapReduce framework was that it could process only already existing data.
  • Lazy Evaluation: Spark transformations done using Spark RDDs are lazy. Meaning, they do not generate results right away, but they CREATE new RDDs from existing RDD. This lazy evaluation increases the system efficiency.
  • Support Multiple Languages: Spark supports multiple languages like R, Scala, Python, Java which provides dynamicity and helps in overcoming the Hadoop limitation of application development only using Java.
  • Hadoop Integration: Spark also supports the Hadoop YARN cluster manager thereby making it flexible.
  • Supports Spark GraphX for graph parallel execution, Spark SQL, libraries for MACHINE learning, etc.
  • Cost Efficiency: Apache Spark is considered a better cost-efficient solution when compared to Hadoop as Hadoop required large storage and data CENTERS while data processing and replication.
  • Active Developer’s Community: Apache Spark has a large developers BASE involved in continuous development. It is considered to be the most important project undertaken by the Apache community.


Discussion

No Comment Found