|
Answer» Below are the core components of Spark Ecosystem. - Spark Core
- Spark SQL
- Spark Streaming
- Mlib
- GraphX
Spark Core: Spark Core is the BASIC engine for large scale parallel and distributed data processing. It performs various important functions like memory management, monitoring jobs, fault-tolerance, job scheduling and interaction with the STORAGE system. Spark Streaming: - Spark streaming makes it easy to build a scalable fault-tolerant streaming application.
- Spark combines streaming with batch and interactive queries.
Spark SQL : - Spark SQL is Apache Spark module for WORKING with STRUCTURED data.
- Seamlessly mix SQL queries with the Spark program.
- Provides uniform data access.
- Provides Hive Integration. We can run SQL or existing HiveQL on an existing warehouse.
Mlib: - Mlib is Apache Spark’s scalable machine learning library used to perform machine learning in Spark.
GraphX: - GraphX is Spark API for graph and graph-parallel computation.
- It unifies ETL, exploratory analysis and iterative graph computation WITHIN a single system.
- Spark’s GraphX has comparable performance to the fastest specialised graph processing systems.
|