1.

Does PySpark provide a machine learning API?

Answer»

Similar to Spark, PySpark provides a machine learning API which is known as MLlib that supports various ML algorithms like:

  • mllib.classification − This supports different methods for binary or multiclass classification and regression analysis like Random Forest, Decision Tree, Naive Bayes etc.
  • mllib.clustering − This is used for solving clustering problems that aim in grouping entities subsets with one another depending on similarity.
  • mllib.fpm − FPM stands for Frequent Pattern Matching. This library is used to mine frequent items, subsequences or other structures that are used for analyzing LARGE datasets.
  • mllib.linalg − This is used for solving problems on linear algebra.
  • mllib.recommendation − This is used for collaborative filtering and in recommender systems.
  • spark.mllib − This is used for supporting model-based collaborative filtering where small latent FACTORS are identified using the Alternating LEAST Squares (ALS) algorithm which is used for predicting missing entries.
  • mllib.regression − This is used for solving problems using regression algorithms that find relationships and VARIABLE dependencies.


Discussion

No Comment Found