The Appliance Cometh
Appliances such as IBM’s IDAA can store large tables in a proprietary format and access the data at high-speed. The DB2 Optimizer has been upgraded to consider access paths to tables in the appliance. Application considerations include:
Data load timing. A typical configuration involves a DB2 table that physically exists in two places: on common disk storage and in an appliance. Depending on the query, the optimizer chooses between the two versions of the table based on the cost of the access paths. However, there’s the issue of timing. The table in the appliance needs to be populated and there are different methods of doing this, including a bulk load or real-time update based on committed transactions on the DB2 log. Each choice has performance and data availability implications.
Sandbox, development, test. How will newly developed applications be tested? Will there be multiple appliances (or appliance instances) for multiple environments? For current ad hoc analysis, there’s probably only a production appliance. As your implementation matures and lines of business develop requirements for application access, this will change. Consider a Big Data store of customer transactions. If analysis of the data indicates that customers would be better served (and profits would be higher) by implementing a new application for customer service to query the data in real-time, performance of the application is now a high priority. To test the application, a test environment with an appliance and full production data image may be necessary.
Which tables to store. While appliances are designed to store huge amounts of data, the intent isn’t to store all enterprise DB2 tables in them. On the other hand, if performance is a concern for queries that join multiple tables, these tables should be instantiated in the appliance. This probably means storing a set of “core” tables in the appliance, along with selected Big Data tables.
Backup and recovery. Business analytics against Big Data implies querying static data. While the portions of the Big Data tables being queried are probably static, other business data may not be. This data is subject to backup and recovery. If a rogue application, or hardware or software malfunction require recovery of a DB2 table, what happens in the Big Data store? If the table exists there, how will it be recovered and how long will it take? This may cause issues with your disaster recovery planning.
With Big Data up and running in your shop, what does the future hold? Lines of business will desire access to this data, and application developers must be prepared to develop and follow new best practices in this environment.
To achieve their return on investment, lines of business may push to implement important or mission-critical business analytics (BA) application solutions that query the Big Data store. Initial implementation success may be problematic. In two recent IDC surveys (the IDC Vertical IT & Communications Survey and the DW and IDC Business Intelligence & Analytics Survey), only 28 percent of respondents were very satisfied or satisfied with the performance of their BA solution. Further, 46 percent of respondents noted there would be an “immediate material negative impact on business operations” if the BA solution was out of service for up to six hours.
Of course, different application types will encounter different concerns. Some major application categories are:
Single query. Most Big Data implementations include the desire to issue ad hoc queries against the new data store. Indeed, there are multiple business intelligence and analytics software packages available from vendors. The next logical step is the creation of internally driven business applications that construct and issue such queries based on specified criteria. As the number of applications increases, the load on the Big Data appliance grows. Data access scheduling and performance now become critical elements of an overall data governance plan.
Reporting. Another category of application is the report, either one-time or regularly scheduled. Many of the initial ad hoc queries will provide valuable insight into the data; hence, business areas will desire to execute these queries regularly, perhaps expanding them to include multiple time periods and multiple geographic areas with subtotals and grand totals. Simple, one-time queries may evolve into daily, weekly and monthly reports. These applications will figure prominently in performance and capacity planning considerations.
Complex SQL, multi-table access. Some applications will require access to data in multiple locations, including online data stores, data warehouse tables and tables in the appliance. Sometimes this may not prove feasible. Some appliances require that an SQL statement executed in the appliance must access tables stored within the appliance. In these cases, IT architects may have no choice but to plan a migration of tables to the appliance, again with performance and capacity planning considerations.
IT architects, DBAs and management must look beyond the current hype of Big Data to the future. An expanding number of applications will require Big Data access. In addition, as internal departments become more familiar with the business value of the Big Data, the number of ad hoc queries and regular reports will expand.
IT will need to address application performance, system performance and resource capacity concerns. These include Big Data appliance hardware and software upgrades, software tools for performance monitoring, acquisition of larger and larger amounts of data storage, and backup and recovery processes.
All of these concerns are part of data governance best practices. Development, documentation, adherence to and review of these practices will be critical elements in determining whether or not Big Data remains an important part of your IT infrastructure.