|
Answer» There are various platforms available for Big Data. Some of these are open source and the others are license based. In open-source, we have Hadoop as the biggest Big Data platform. The other alternative being HPCC. HPCC stands for High-Performance Computing Cluster. In a licensed CATEGORY, we have Big Data platform offerings from Cloudera(CDH), Hortonworks(HDP), MapR(MDP), etc. (Hortonworks is now merged with Cloudera.) - For Stream processing, we have tools like - Storm.
- The Big Data platforms landscape can be better understood if we consider it usage wise.
- For example, in the data storage and management category, we have big players like Cassandra, MongoDB, etc.
- In data cleaning category we have tools like OpenRefine, DataCleaner, etc.
- In data mining category we have IBM SPSS, RapidMiner, Teradata, etc.
- In the data visualization category, the tools are Tableau, SAS, SPARK, Chartio, etc.
Features and SPECIALITIES of these Big Data platforms/tools are as follows: 1) Hadoop: - Open Source
- Highly Scalable
- Runs on Commodity Hardware
- Has a good ecosystem
2) HPCC: - Open Source
- Good Alternative to Hadoop
- Parallelism at Data, Pipeline and System Level
- High-Performance Online Query Applications
3) Storm: - Open Source
- Distributed Stream Processing
- Log Processing
- Real-Time Analytics
4) CDH: - Licence based (Limited Free Version available)
- Cloudera Manager for easy administration
- Easy implementation
- More Secure
5) HDP: - Licence based (Limited Free Version available)
- Dashboard with Ambari UI
- Data Analytics Studio
- HDP Sandbox available for VirtualBox, VMware, Docker
6) MapR: - Licence based (Limited Free Version available)
- On-premise and cloud SUPPORT
- Features AI and ML
- Open APIs
7) Cassandra: - Open Source
- NoSQL Database
- Log-Structured Storage
- Includes Cassandra Structure Language (CQL)
8) MongoDB: - Licence based (also Open Source)
- NoSQL Database
- Document Oriented
- Aggregation Pipeline etc.
|