There are many different ways to solve large data warehousing performance problems. This article covers implementing Materialized Query Tables (MQTs) in a production data warehouse environment; it addresses deployment of MQTs (including analysis steps, configuration settings and system parameters) on the distributed Linux, Unix and Windows (LUW) platforms, and mainframe DB2 Version 8 for z/OS.
At a recent client consulting engagement, extended SQL query times were causing tremendous sort and application I/O problems. These performance problems manifested themselves in unacceptable wait and response times and they needed to be fixed quickly.
The environment had hundreds of users submitting SQL queries through an Online Analytical Processing (OLAP) front-end tool, asking random questions against a large data warehouse of more than 810 million rows spread across 2.5 terabytes of DASD. Research showed that the data warehousing query reporting workload was causing the problem and might be solved by implementing MQTs.
By implementing MQTs, the company avoided upgrading its CPU capacity, saved more than 750 million I/Os per day, and turned query response time from five hours to less than a second.
MQTs were designed and built into DB2 several releases ago on the distributed LUW platforms initially called Automatic Summary Tables (ASTs). MQTs have been such a success on the LUW platform that they’ve been implemented in the new DB2 V8 for z/OS.
MQTs are what their name implies: data from an SQL query stored as a table. Because of the flexibility of SQL, you can write and leverage MQTs for situations where a complex or I/O-intensive query data is used repeatedly. This can be a significant savings, since the I/O-intensive or complex query occurs once and the results are available to everyone to use without doing all the complex or intensive work again.
Our client was using its OLAP tool to repeatedly sum and total sales and product information. These queries happened hundreds of times a day, adding heavy I/O, sort and workload requirements to the system. These queries were targeted for becoming MQTs because that could be done once, eliminating the need for millions of identical SQL I/O activities each day.
You should define MQTs so a wide spectrum of your workload can reference and leverage them. Begin your analysis by taking a DB2 snapshot monitor trace to capture the SQL that’s a representative sample of the overall workload. Be careful with this step because running these DB2 monitor snapshots can be a big overhead.
SQL queries that do any SQL functions or sorting—such as SUM, AVG, GROUP BY, ORDER BY, ROLLUP, etc.—are good candidates for MQTs. Pay close attention to queries and functions that are repeated numerous times when designing your MQTs. Every I/O-intensive monster query should be captured and examined. Avoiding doing that same monster query again can save thousands, if not millions, of I/Os per day; this can be a huge performance improvement.
Once you have a representative sample of your SQL workload, pass it to the DB2 Design Advisor tool and study the recommendations it offers regarding new definitions, indexes, and MQTs. The Design Advisor Tool is included in the DB2 LUW product; it easily analyzes your workload and recommends potential MQT definitions.