The open source software Hadoop is creating a lot of buzz in the big data world. It’s commonly known as the MapReduce algorithm on which Google built its empire. Forrester Research calls it, “The Open Source Heart Of Big Data.”
However, as the buzz builds the term is thrown around recklessly, leaving many confused as to how it works and where it is used. Derrick Harris of GigaOm authored an article explaining Hadoop terminology and top uses.
Derrick writes, “Hadoop is an Apache Software Foundation project consisting of two primary subprojects — Hadoop MapReduce and the Hadoop Distributed File System. MapReduce is the parallel-processing engine that allows Hadoop to churn through large data sets in relatively short order. HDFS is the distributed file system that lets Hadoop scale across commodity servers and, importantly, store data on the compute nodes in order to boost performance (and potentially save money). “
Three Ways Hadoop is Used
Management Software: These Hadoop products act as an operating system and help you trouble shoot. There aren’t many unique vendors in this space. Derrick writes, “Such products are usually sold or offered by companies peddling Hadoop distributions because even when commercially packaged, Hadoop is still a complex architecture…”
Application software: Application software resides on top of Hadoop distribution and improves processing and/or performs analytics. Derrick mentions Karmasphere Analyst, Hadapt, and HStreaming. Jeff Kelley from Services Angle also calls out a few others “Hadoop applications of the future” such as Datameer, Tresata and Tidemark.
To learn more about how Hadoop is used, read Derrick’s full article in GigaOm. To discover more emerging Hadoop companies transforming the world of big data, come to our Under the Radar Conference on April 25-26 in Mountain View, CA.