Sep 27 ’10
IBM Smart Analytics Optimizer: Revolutionary Technology for Data Warehouse and BI Workloads
IT budgets are shrinking, but businesses are generating more data than ever, and must make sense of it faster than ever—turning information into insight. Business expectations are growing and every company faces the challenge of doing more with less.
How can managers make the right business decision with confidence? Companies must get smarter about themselves, their markets, and their customers. Analytic Business Intelligence (BI) solutions are designed to help them. Successful businesses understand they must use information to drive innovation to change the way they do business and differentiate themselves from competitors.
Many companies have taken a data warehousing approach and stacked data. Of course, this presents some issues, such as dedicated servers and storage, that result in siloed data stores run and managed separately from operational systems. Mostly, this data is wildly distributed in the company. This causes challenges for those who need a clear, enterprisewide view. As these discrete systems populate across an organization, inconsistencies, complexities, and indecisiveness set in.
While the trend to put information in the hands of decision-makers has powerful momentum, successful deployments require data warehouse changes that involve the underlying enterprise information infrastructure. Data warehouses must deliver information into operational transactions, portals, and other tools to provide the necessary insight to the right people, at the right time, in the right context.
The latest system technology is creating a more integrated approach to deliver near-real-time, actionable intelligence in an easy-to-digest format for desktops, laptops, and handheld devices— allowing managers, wherever they are, to make better, smarter business decisions with greater confidence than ever before, faster than ever before. What’s occurring is a paradigm shift from isolated, application-focused systems to an enterprisewide information infrastructure. Instead of continually moving the data to the users, organizations now can locate their business processes next to the operational data to simplify the infrastructure.
Now, companies can eliminate the complexities of existing systems to deliver an integrated environment to meet their data warehouse processing requirements. The inefficiencies inherent in having multiple, distributed copies of data across systems can be eliminated by proposing a new model that streamlines a business infrastructure. The newest offering is the IBM Smart Analytics Optimizer, which lets businesses integrate BI into their environment.
By combining operational and BI workloads, you can bring together your business systems, data, and decision-makers to deliver enterprisewide insights that enable more informed decisions. Consider how much simpler it would be to add a workload optimized resource to a central processing complex that shares data and work in a way that matches the appropriate resource to the characteristics of the work. How much easier would it be to create and manage a central environment designed to integrate, not separate, your business processes?
With IBM Smart Analytics Optimizer, you can deploy a BI environment that manages queries based on their characteristics, routing them to the right resources without having database professionals manually manage the flow. Instead of dispersed data silos, IBM Smart Analytics Optimizer offers a consolidated, centralized view of business data.
Data warehousing and BI queries are typically complex and often ad hoc in nature. There’s a common concern about the cost of running these highly resource-intensive workloads in a native z/OS environment. Additionally, data warehousing and BI applications increasingly require fast response times, regardless of the query complexity. IBM Smart Analytics Optimizer augments IBM DB2 for z/OS into an enterprise data warehouse by accelerating selected queries by orders of magnitude. In addition to providing Online Transaction Processing- (OLTP)-like performance for Online Analytical Processing- (OLAP)-type queries, IBM Smart Analytics Optimizer also significantly alleviates the need for typical performance tuning activities.
What Will Change for DBAs?
Parallel query processing and advanced storage-subsystem technologies, together with optimized database management systems, have significantly improved the speed by which data can be accessed and evaluated. However, physical limits with traditional approaches call for more radical steps to satisfy current and future business requirements. BI queries are complex and sometimes hard to tune because of their unpredictable design. Expression on indexes, Materialized Query Tables (MQTs), de-normalization and aggregation techniques, and a large variety of tools and features are available for DBAs’ use. The IBM Smart Analytics Optimizer reduces the need for complex query tuning. Designed to deploy and scale with ease, IBM Smart Analytics Optimizer is an integrated, sustainable solution that can take your existing BI environment to a whole new level of performance.
In today’s data processing environment, mixed with OLTP and BI workloads, some query patterns can’t be executed immediately because of excessive resource consumption. They’re scheduled for a less busy time interval. With IBM Smart Analytics Optimizer, those patterns can be reconsidered for a production environment. Real-time, slice-and-dice “cube applications” can also benefit from the IBM Smart Analytics Optimizer and speed up analysts’ tasks.
The Technical Solution
OLAP queries typically scan large amounts of data—from gigabytes to terabytes—to come up with answers to business questions. These questions have been transformed to SQL and passed to DB2 for z/OS, typically involving dynamic SQL. In traditional environments, DBAs, application programmers, IT architects, and system engineers have done a tremendous job of tuning their environments. But the challenge is still coming into your system as ad hoc queries scanning huge amounts of data and consuming large amounts of resources, both in CPU and I/O capacity.
Usually, these queries can’t be screened before they’re submitted to the system, resulting in entirely unknown resource consumption. The spinning speed of DASD devices becomes a limiting factor when scanning terabytes of data. For the moment, we can simply look at these limits as dictated by the laws of physics. The Smart Analytics Optimizer addresses this unknown resource consumption and the need for faster ad hoc OLAP queries.
A research project called Blink (see www.almaden.ibm.com/cs/projects/blink/) laid the foundation for development of the Smart Analytics Optimizer. With Blink, IBM developed a solution to provide consistent query response times for a given amount of data, regardless of the structure of a query accessing the data. Achieving that performance-oriented goal required implementing leading technology trends: hybrid row/column stores in main memory and predicate evaluation on compressed data, combined with multi-core and vector-optimized algorithms.
The IBM Smart Analytics Optimizer is like an appliance to the extent that it adds another Resource Manager to DB2 for z/OS, just like the Internal Resource Locking Manager (IRLM), Data Manager (DM), or Buffer Manager (BM). It’s a highly integrated solution and the data continues to be managed and secured by the reliable database platform, DB2 for z/OS. No changes are required to existing applications; the applications don’t have to be aware of its existence to benefit from the capabilities it offers.
Whenever queries are eligible for processing by the IBM Smart Analytics Optimizer, users will immediately benefit from shortened response times without any further actions. Both users and applications continue to connect to DB2 for z/OS while being entirely unaware of the IBM Smart Analytics Optimizer’s presence. Instead, the DB2 for z/OS optimizer is aware of an IBM Smart Analytics Optimizer’s existence in a given environment and can execute a given query either on the IBM Smart Analytics Optimizer or by using the already well-known access paths in DB2 for z/OS. Due to cost-based decisions for any query-routing, all queries are executed in their most efficient way, regardless of their type (OLAP vs. OLTP).
The IBM Smart Analytics Optimizer consists of a specific number of blades attached to a System z. Each IBM Smart Analytics Optimizer consists of a large number of processors and main memory that can hold up to thousands of gigabytes of raw data. The number of blades connected gives you the total real storage and processing capacity for the IBM Smart Analytics Optimizer. There’s no communication with the IBM Smart Analytics Optimizer from anywhere outside your System z machine; it’s fully integrated into System z technology and fenced from the outer world. Even if it’s physically located outside a System z chassis, the solution is designed to be extremely robust and to guarantee the established Service Levels Agreements (SLAs) (see Figure 1).
How It Works With DB2
The first step is identifying long-running queries and the data accessed by those queries in your data warehouse. A set of tables related to each other (either from a logical point or by referential constraints) can be referred to as a mart. A mart is in the sweet spot of the IBM Smart Analytics Optimizer if it consists of one large fact table and more smaller dimension tables, as is typically true for star or snowflake schema data models.
Once a mart is identified, a snapshot of those tables is redundantly stored and loaded in the IBM Smart Analytics Optimizer by using UNIX System Services pipes. The unload process can be controlled either by the IBM Smart Analytics Optimizer studio, a Graphical User Interface (GUI), or IBM Smart Analytics Optimizer’s administrative stored procedures.
From a logical view, there are two different types of blades inside the IBM Smart Analytics Optimizer. From a physical perspective, all blades are identical. Most available blades are referred to as worker nodes, while a smaller number are brought up as coordinator nodes. The worker nodes actually store the data while the coordinator nodes act as an interface to DB2 for z/OS. During the unload process, the data flows to the coordinator nodes and is distributed to available worker nodes.
While being loaded in the IBM Smart Analytics Optimizer, the data is compressed and distributed among available blades, allowing for massive parallel processing of those queries accessing the data. While the data is stored in main memory, it’s also stored on a separate storage unit for failover purposes. Once the offloading process completes, the mart is ready to be queried (see Figure 2).
According to an International DB2 Users Group (IDUG) study, most data warehouses are refreshed once a day or less. Regarding this information, for most OLAP applications, querying data stored on an IBM Smart Analytics Optimizer can deliver breakthroughs in terms of performance and reduce the time needed to complete today’s business processes.
Here’s the interesting question: Are there any business questions or business cases relying on SQL statements that just didn’t complete in an appropriate time and you never moved these applications to production? It might be time to reconsider these applications.
What happens if a query accesses the data that’s also available in a mart? Because of the IBM Smart Analytics Optimizer’s deep integration into DB2 for z/OS, the optimizer makes a cost-based decision if a query will be routed to the IBM Smart Analytics Optimizer for execution or whether it will be processed in the traditional way by accessing the data pages, either if they’re located in the buffer pool or must be retrieved from DASD.
Queries being routed to the IBM Smart Analytics Optimizer are expected to benefit from a significant acceleration. Results from an early beta customer experience, presented at the Information On Demand Conference 2009 in Las Vegas, have shown a query acceleration by a factor of more than 370 for a given query. But mileage will vary, depending on individual data and queries to access the data. Internally, the IBM Smart Analytics Optimizer can be looked at as kind of an MQT. But the biggest differentiator is that MQTs are designed to satisfy a limited set of queries. These queries need to match the MQT definition in regard to possible restriction, projection, grouping, and aggregation operators while the IBM Smart Analytics Optimizer matches a much larger scope of queries because they don’t assume any of these operations.
What happens if most users can work with a snapshot, but some users definitely need to query original data since important updates must be honored? If a defined mart hasn’t been refreshed to contain the most recent data, queries can be routed to DB2 for z/OS traditional data access methods by setting CURRENT REFRESH AGE special register to 0. On the other hand, a value of ANY allows for IBM Smart Analytics Optimizer processing, giving you maximum flexibility.
A query can consist of one or more query blocks. Typical queries eligible for IBM Smart Analytics Optimizer processing access data in a large (fact) table, including inner joins or left outer joins with smaller (dimension) tables. If only one or more query blocks of a given query use the structure previously described, queries can also partially benefit from the IBM Smart Analytics Optimizer since DB2 for z/OS routes one query block at a time to the IBM Smart Analytics Optimizer for execution. To predict the possible amount of queries that can be routed to the IBM Smart Analytics Optimizer for execution in a DB2 for z/OS environment using dynamic SQL, IBM offers an assessment of DB2 for z/OS OLAP workloads (see Figure 3).
Offloading Queries to the IBM Smart Analytics Optimizer
DBAs can identify which query blocks will be routed to the IBM Smart Analytics Optimizer before execution, so they don’t need to apply any of the aforementioned performance practices for those types of query blocks. Additionally, it’s easy to understand if a query will be executed using a mart stored inside the IBM Smart Analytics Optimizer by obtaining the new information that will be available in a new explain table named DSN_QUERYINFO_TABLE. The explain function is enhanced to populate this table. If column TYPE = ‘A’ and the REASON_CODE = 0 after executing explain for a given query, a query block successfully qualifies for offloading with the Accelerated Query Table (AQT) specified in QINAME1 and QINAME2.
To fully protect System z and DB2 for z/OS SLAs, the IBM Smart Analytics Optimizer has its own built-in failover mechanisms. DB2 for z/OS is fully fenced against possible accelerator failures. The main memory of the worker nodes contains data of one or more marts. If a blade fails, a coordinator node takes the content of the failing blade’s memory from the storage where it has also been stored during the offload process and continues processing. After the failing node has been brought up again, it’s now referred to as a coordinator node.
If, for some reason, the entire IBM Smart Analytics Optimizer has been disabled and DB2 for z/OS realizes that the IBM Smart Analytics Optimizer is no longer available for query processing, it simply uses a fallback access path and accesses the data.
System Setup and Availability
The IBM Smart Analytics Optimizer is a combined solution of both new hardware and software. It runs on the new IBM zEnterprise BladeCenter Extension (zBX). The IBM Smart Analytics Optimizer software is shipped as SMP/E installable for the DB2-related stored procedures, and the acceleration software is installed on the zBX via the System z Service Element using a product DVD. Once installed, updates to the software are implemented as PTFs, which are then propagated to the blades by calling a stored procedure. The GUI to identify any marts needs to be installed on a client machine. The software prerequisite for the IBM Smart Analytics Optimizer is DB2 9 for z/OS. IBM Smart Analytics Optimizer will be GA on Nov. 19, 2010.