1.

What are the core components of Spark Ecosystem?

Answer»

Below are the core components of Spark Ecosystem.

  1. Spark Core
  2. Spark SQL
  3. Spark Streaming
  4. Mlib
  5. GraphX

Spark Core:

Spark Core is the BASIC engine for large scale parallel and distributed data processing.  It performs various important functions like memory management, monitoring jobs, fault-tolerance, job scheduling and interaction with the STORAGE system.

Spark Streaming:

  • Spark streaming makes it easy to build a scalable fault-tolerant streaming application.
  • Spark combines streaming with batch and interactive queries.

Spark SQL :

  • Spark SQL is Apache Spark module for WORKING with STRUCTURED data.
  • Seamlessly mix SQL queries with the Spark program.
  • Provides uniform data access.
  • Provides Hive Integration. We can run SQL or existing HiveQL on an existing warehouse.

Mlib:

  • Mlib is Apache Spark’s scalable machine learning library used to perform machine learning in Spark.

GraphX:

  • GraphX is Spark API for graph and graph-parallel computation.
  • It unifies ETL, exploratory analysis and iterative graph computation WITHIN a single system.
  • Spark’s GraphX has comparable performance to the fastest specialised graph processing systems.


Discussion

No Comment Found