|
Answer» A few features of Hadoop are as follows: - OpenSource: Hadoop is free to use because it is open-source. Because it is an open-source project, the source code is available online for anybody to read and MODIFY according to their industry's needs.
- Fault Tolerance: Hadoop runs on commodity hardware (cheap computers) that can crash at any time. Data in Hadoop is duplicated across multiple DataNodes in a Hadoop cluster, ensuring data availability even if one of your systems fails. If a single machine has a technical problem, you can read all of the data from that machine. Because the data is copied or replicated by default, it can also be read from other nodes in a Hadoop cluster. Hadoop produces three copies of each file block by default and STORES them in various nodes. This replication factor is customizable and can be modified in the hdfs-site.xml file by modifying the replication attribute.
- Clusters with High Scalability: Hadoop is a model that is extremely scalable. A significant volume of data is split across several low-cost devices in a cluster and processed in parallel. According to the needs of the business, the number of these computers or nodes can be increased or decreased. Traditional RDBMS (Relational DataBase Management System) systems are incapable of scaling to massive amounts of data.
- Cheap and cost-effective: Unlike traditional Relational DATABASES, which require expensive hardware and high-end CPUs to deal with Big Data, Hadoop is open-source and leverages cost-effective commodity technology, resulting in a cost-effective paradigm. The difficulty with traditional Relational databases is that holding a large number of data is not cost-effective, thus businesses have begun to eliminate raw data, which may not result in the best business scenario. Hadoop offers two key cost advantages: it is open-source, which means it is free to use, and it employs commodity hardware, which is very inexpensive.
- High Availability: In a Hadoop cluster, fault tolerance ensures high availability. The availability of data on the Hadoop cluster is referred to as high availability. Because of fault tolerance, if one of the DataNodes fails, the data can be recovered from any other node where the data is replicated. The Hadoop cluster with the highest availability also has two or more NAME Nodes, i.e. Active NameNode and Passive NameNode, also known as standby NameNode, are two types of name nodes. If the Active NameNode fails, the Passive NameNode will take over and give the same data as the Active NameNode, which the user may easily use.
- Flexibility: Hadoop is built in such a way that it can efficiently handle any type of dataset, including structured (MySql Data), semi-structured (XML, JSON), and unstructured (Images and Videos). This implies it can analyse any type of data REGARDLESS of its form, making it extremely adaptable. It is extremely beneficial to organisations because it allows them to process enormous datasets quickly. As a result, businesses can utilise Hadoop to analyse important insights from data sources such as social media, email, and other sources. Hadoop's flexibility allows it to be used for log processing, data warehousing, fraud detection, and other tasks.
|