1.

Can Apache Spark be used along with Hadoop? If yes, then how?

Answer»

Yes! The main feature of Spark is its COMPATIBILITY with Hadoop. This makes it a powerful framework as using the combination of these two helps to leverage the processing CAPACITY of Spark by making use of the best of Hadoop’s YARN and HDFS features.

Hadoop can be integrated with Spark in the FOLLOWING ways:

  • HDFS: Spark can be configured to run atop HDFS to leverage the feature of DISTRIBUTED replicated storage.
  • MapReduce: Spark can also be configured to run alongside the MapReduce in the same or DIFFERENT processing framework or Hadoop cluster. Spark and MapReduce can be used together to perform real-time and batch processing respectively.
  • YARN: Spark applications can be configured to run on YARN which acts as the cluster management framework.


Discussion

No Comment Found