Apache Hadoop is an open-source, free and Java based software framework.It is licensed under an Apache V2 license.Its written in java that allows distributed processing of large data-sets across clusters of computers using simple programming models. The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.
Why to use Hadoop
Hadoop is being widely used because of its key features mentioned below:
- It is very cost effective, as we don’t need any specific Hardware set up to run it. Just a cluster of computers or servers.In terms of legacy systems,it generates cost benefits by bringing massively parallel computing to commodity servers, resulting in a substantial reduction in the cost per terabyte of storage.
- Hadoop Brings Flexibility In Data Processing. Today’s biggest challenge organizations face is to handle unstructured data.Hadoop manages data whether structured or unstructured, encoded or formatted, or any other type of data. This enhances company’s decision making process.
- Hadoop Is Easily Scalable. Since it is an open source platform and runs on industry-standard hardware, it is extremely scalable platform where new nodes can be easily added in the system as & when data volume grows, without effecting the existing systems.
- Hadoop is Highly Available and Fault Tolerant.The data in Hadoop is stored in HDFS where data is automatically replicated at two other locations. So, even if one or two of the systems collapse, the file is still available on the third system at least.The level of replication is configurable and this makes Hadoop incredibly reliable data storage system. This means, even if a node gets lost or goes out of service, the system automatically reallocates work to another location of the data and continues processing as if nothing had happened.
- Hadoop has a very robust ecosystem that is well suited for data analyst and Developers.It consist of many tools and technologies which can be combines depending on delivery suitables.For example,Hadoop ecosystem comes with projects such as MapReduce, Hive, HBase, Zookeeper, HCatalog, Apache Pig etc. and many more as the market grows.
Challenges with Hadoop :
Hadoop is not suitable for On-Line Transaction Processing workloads where data is randomly accessed on structured data like a relational database.Also, Hadoop is not suitable for On-Line Analytical Processing or Decision Support System workloads where data is sequentially accessed on structured data like a relational database, to generate reports that provide business intelligence. As of Hadoop version 2.6, updates are not possible, but appends are possible.
What are core components of Hadoop? What lies in Hadoop ecosystem..Read my next blog!!