InterviewSolution
| 1. |
What is Spark Mlib? Intermediate |
|
Answer» Mahout is a machine learning library for Hadoop, similarly, MLlib is a Spark library. MetLib provides different algorithms, that algorithms scale out on the CLUSTER for data processing. Most of the data scientists USE this MLlib library. Mlib has below advantages. Ease of Use MLlib can be usable in multiple widely programming languages like Java, Scala, Python, and R. MLlib also fits into Spark's APIs and interoperates with NumPy in Python and R libraries. You can use any Hadoop data source like. HDFS, HBase, or local files for making it EASY to plug into Hadoop workflows. Performance As we already discussed on the above questions MLib is 100x faster than MapReduce and It has high-quality algorithms. Spark excels at ITERATIVE computation, enabling MLlib to run fast. At the same time, we care about algorithmic performance: MLlib contains high-quality algorithms that leverage ITERATION and can yield better results than the one-pass approximations sometimes used on MapReduce. Runs Everywhere Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud, against diverse data sources. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources. |
|