1.

What are the different platforms to deal with Big Data?

Answer»

There are various platforms available for Big Data. Some of these are open source and the others are license based.

In open-source, we have Hadoop as the biggest Big Data platform. The other alternative being HPCC. HPCC stands for High-Performance Computing Cluster.

In a licensed CATEGORY, we have Big Data platform offerings from Cloudera(CDH), Hortonworks(HDP), MapR(MDP), etc. (Hortonworks is now merged with Cloudera.)

  • For Stream processing, we have tools like - Storm.
  • The Big Data platforms landscape can be better understood if we consider it usage wise.
  • For example, in the data storage and management category, we have big players like Cassandra, MongoDB, etc.
  • In data cleaning category we have tools like OpenRefine, DataCleaner, etc.
  • In data mining category we have IBM SPSS, RapidMiner, Teradata, etc.
  • In the data visualization category, the tools are Tableau, SAS, SPARK, Chartio, etc.

Features and SPECIALITIES of these Big Data platforms/tools are as follows:

1) Hadoop: 

  • Open Source
  • Highly Scalable
  • Runs on Commodity Hardware
  • Has a good ecosystem

2) HPCC: 

  • Open Source
  • Good Alternative to Hadoop
  • Parallelism at Data, Pipeline and System Level 
  • High-Performance Online Query Applications

3) Storm: 

  • Open Source                
  • Distributed Stream Processing               
  • Log Processing            
  • Real-Time Analytics

4) CDH: 

  • Licence based (Limited Free Version available)
  • Cloudera Manager for easy administration
  • Easy implementation
  • More Secure

5) HDP: 

  • Licence based (Limited Free Version available)             
  • Dashboard with Ambari UI            
  • Data Analytics Studio          
  • HDP Sandbox available for VirtualBox, VMware, Docker

6) MapR: 

  • Licence based (Limited Free Version available)               
  • On-premise and cloud SUPPORT           
  • Features AI and ML            
  • Open APIs

7) Cassandra: 

  • Open Source                     
  • NoSQL Database                 
  • Log-Structured Storage            
  • Includes Cassandra Structure Language (CQL)

8) MongoDB: 

  • Licence based (also Open Source)           
  • NoSQL Database      
  • Document Oriented           
  • Aggregation Pipeline etc.


Discussion

No Comment Found