Does PySpark provide a machine learning API?

1.	Does PySpark provide a machine learning API?
Answer» Similar to Spark, PySpark provides a machine learning API which is known as MLlib that supports various ML algorithms like: mllib.classification − This supports different methods for binary or multiclass classification and regression analysis like Random Forest, Decision Tree, Naive Bayes etc. mllib.clustering − This is used for solving clustering problems that aim in grouping entities subsets with one another depending on similarity. mllib.fpm − FPM stands for Frequent Pattern Matching. This library is used to mine frequent items, subsequences or other structures that are used for analyzing LARGE datasets. mllib.linalg − This is used for solving problems on linear algebra. mllib.recommendation − This is used for collaborative filtering and in recommender systems. spark.mllib − This is used for supporting model-based collaborative filtering where small latent FACTORS are identified using the Alternating LEAST Squares (ALS) algorithm which is used for predicting missing entries. mllib.regression − This is used for solving problems using regression algorithms that find relationships and VARIABLE dependencies.

Answer»

Similar to Spark, PySpark provides a machine learning API which is known as MLlib that supports various ML algorithms like:

mllib.classification − This supports different methods for binary or multiclass classification and regression analysis like Random Forest, Decision Tree, Naive Bayes etc.
mllib.clustering − This is used for solving clustering problems that aim in grouping entities subsets with one another depending on similarity.
mllib.fpm − FPM stands for Frequent Pattern Matching. This library is used to mine frequent items, subsequences or other structures that are used for analyzing LARGE datasets.
mllib.linalg − This is used for solving problems on linear algebra.
mllib.recommendation − This is used for collaborative filtering and in recommender systems.
spark.mllib − This is used for supporting model-based collaborative filtering where small latent FACTORS are identified using the Alternating LEAST Squares (ALS) algorithm which is used for predicting missing entries.
mllib.regression − This is used for solving problems using regression algorithms that find relationships and VARIABLE dependencies.

Discussion