Current use of the term Big Data is a result of the confluence of several industry trends over the last decade. These trends include the accumulation of large amounts of data in databases, the advent of large-scale enterprise data warehouses, multiple means of high-speed data transfer and an explosion in the design and use of sensors for gathering data.
Vendors and software developers responded with many options, including special-purpose hardware and software for data storage and retrieval and complex and powerful analytical software. The NoSQL open source database management systems (DBMSes) emerged as a part of these trends (see https://en.wikipedia.org/wiki/NoSQL).
Many IT enterprises already have in place a Big Data implementation. Whether it’s a pilot project or a mission-critical application, large data stores and analytics packages are now a common occurrence.
There are some warning signs, however, that Big Data may be at the peak of inflated expectations. According to Svetlana Sicular, research director at IT research and advisory company Gartner, Inc., Big Data is entering the “trough of disillusionment” phase of the Gartner Hype Cycle (see http://blogs.gartner.com/svetlana-sicular/big-data-is-falling-into-the-trough-of-disillusionment/). According to Gartner:
“Interest wanes as experiments and implementations fail to deliver. Producers of the technology shake out or fail. Investments continue only if the surviving providers improve their products to the satisfaction of early adopters” (see www.gartner.com/technology/research/methodologies/hype-cycle.jsp).
How should IT management act in order to avert failure? More specifically, what should database administrators (DBAs) and application designers do to prepare the enterprise for success?
Advent of the Appliance
When Big Data first appeared, it was characterized by the three V’s: volume, velocity and variety; large volumes of multi-structured and unstructured data arriving at the server in a continuous flood. IT architects were faced with the complex problem of scaling-up data stores while also developing methods to store large objects (LOBs), self-describing data (XML) and multi-structured data (images, audio, video and click-streams) for analysis.
One common approach was to store the largest data tables in a special-purpose Big Data appliance, such as the IBM DB2 Analytics Accelerator (IDAA), while keeping other tables on the main server. The DB2 Optimizer then analyzes SQL queries and chooses access paths to either DB2 tables or tables stored on the appliance based on access path cost.
Another approach was to store the data in DB2 tables, using the capabilities of the DBMS (for example, the ability of DB2 to store native XML data). In the case of DB2 LUW 10.5, some high-performance options are already available. According to IBM:
“IBM DB2 with BLU Acceleration speeds analytics and reporting using dynamic in-memory columnar technologies. In-memory columnar technologies provide an extremely efficient way to scan and find relevant data. Coupled with innovations such as parallel vector processing and actionable compression ... broader SQL support, I/O and CPU efficiencies, and integration with the DB2 SQL compiler, query optimizer, and storage layer” (see http://www-01.ibm.com/software/data/db2/linux-unix-windows/db2-blu-acceleration/).
The classic goal of an enterprise’s first Big Data implementation is a restricted program of descriptive analytics: ad hoc querying and reporting on the analytical data to determine trends and patterns. Use is typically restricted due to multiple reasons, including high costs, lack of personnel experienced in the complex hardware and software environment, and understanding of the business nature of the data is limited to a few analysts.