There are many pieces to a Big Data infrastructure, but one of the widely recognized workhorses of the movement is Hadoop. It’s an odd name but quite likely you have heard it pop up in conversations throughout your organization.
At its simplest level, Hadoop is comprised of two primary components: MapReduce and the Hadoop Distributed File System (HDFS). MapReduce is the parallel-processing engine that allows Hadoop to churn through large data sets very quickly. HDFS is a file system that lets Hadoop distribute/scale across low-cost servers and store data on multiple compute nodes in order to boost performance (and usually save money).
Hadoop was created because existing approaches were inadequate to process and store huge amounts of data. Its roots began with the challenge of indexing the entire World Wide Web every day. Google developed a paradigm called MapReduce in 2004, and then Yahoo! eventually started Hadoop as an implementation of MapReduce in 2005 and released it as an open source project in 2007.
To ease deployment and management of Hadoop clusters compared to downloading the open-source Hadoop code bases and then stitching everything together, various companies have made commercial distributions available. This article is not going to discuss all these various vendors, but it’s where you’ll hear company names such as IBM BigInsights, Cloudera and Hortonworks, among numerous others. These distributions integrate with various data warehouses, database and other data management products—all with the goal of moving data between Hadoop clusters and other environments.
Is Hadoop the Path to Value From Big Data?
So, now you understand that Hadoop is a popular new environment created by Google to crunch through huge amounts of data. Should your CEO, CFO and other executives even care about this?
There are some things to consider, such as how soon will Big Data affect my company and is Hadoop the right way to unlock the value of the Big Data? In some industries, the way that Big Data will create value is not always clear. Only knowledgeable professionals, with an intimate understanding of the business and the data it generates and can collect, will be able to find the business insights and value from their Big Data. Once you decide Big Data will make an impact, the next questions might be: What is a reasonable amount to spend and should we consider starting with a proof of concept?
Here are a few key questions to consider:
- How much large-scale data is coming at our business and from what sources?
- How can we tell what that information is worth to us?