Using the performance features of DB2 for z/OS can provide significant advantages, but maximizing performance requires giving some attention to database and application design. Design options such as table partitioning, clustering, and indexing are inherent to the product; an application can use them without coding changes, but it must be implemented with application processing needs in mind. To truly maximize DB2’s potential, you must employ good SQL programming techniques. This can make all the difference in application performance. In addition, application design considerations must be taken into account to minimize the calls made to DB2. Finally, your initial design decisions can make a significant difference in application performance and just a few simple rules need to be followed. This article will examine design choices, database features, and SQL tips for achieving superior database application performance.
Call the Database Only When Needed
The major consumer of application database resources is statements that execute too often. In an object-based design, standard methods are used to provide data access for basic data components. This often results in these methods being called generically and far too often. There’s no silver bullet to avoiding database calls; you simply must program for it.
DB2 provides features application developers can use to reduce the number of database calls required for a transaction:
- Joins and nested table expressions. Generic application development that contains SQL statements accessing single tables can result in a highly flexible application design. However, such a design could result in an excessive number of SQL calls. It can be extremely beneficial from a performance perspective to analyze important or frequently accessed application components and access as much of the data as possible for that process in as few SQL statements as possible. This means coding specific statements for specific purposes. You can use joins and nested table expressions to create statements that access multiple tables simultaneously, perform intermediate aggregation, and access data in one table based on data in another table. SQL is extremely flexible and it’s easy to embed one statement within another. Follow this simple rule: When accessing table A, is table B access required, too? Then access them both in the same statement. Our own performance tests on simple two table joins demonstrated a 30 percent CPU savings over the equivalent execution of two separate statements.
- Multi-row fetch and block fetching. If you’re running a remote application and executing read-only cursors, there’s a good chance you’re automatically leveraging block fetching and multi-row fetch. These features let DB2 deliver bulk data for cursors returning large numbers of rows. To ensure you’re taking advantage of block fetching and multi-row fetch, just be sure to add a “FOR FETCH ONLY” or “WITH UR” to your cursors. For local applications, you must code for multi-row fetch yourself. This requires a few keyword changes in your cursor declaration and FETCH statements, and some extra programming to deal with the returned arrays and error handling. This programming technique and examples appear on our Website.
- SELECT from INSERT, UPDATE, DELETE, and MERGE. Read processing has been extended to being able to read from base tables and temporary tables and also from something referred to as a “data change table reference.” This means that when you process a DML statement, such as an UPDATE, and want to return data to the application after the UPDATE runs, you can do so in a single SQL statement instead of two. This lets you retrieve the “after image” results of INSERT, UPDATE and MERGE statements, and the “before image” results of UPDATE and DELETE statements. Figure 1 shows an example of a SELECT from INSERT that retrieves a generated key value and a SELECT from an UPDATE that retrieves data that has been changed as a result of the UPDATE execution. Use of these statement constructs consistently results in dramatically improved performance over the equivalent separate statement executions. These performance improvements are more significant when your application is remote since the message traffic is also seriously reduced.
- Multi-row INSERT and MERGE statements. Multi-row operations exist for one reason only, and that’s to save resources. If your application has processing that operates against multiple rows of data in a table, then a multi-row SQL statement may be something you can use. While not available for Java applications (you can try Java batching instead, but we haven’t had much success with that), they can be used for local programs written in host languages such as COBOL. If you’re doing many inserts, then a multi-row INSERT statement may be best. A multi-row MERGE operates like an “upsert,” which can be thought of as a combination of an INSERT and UPDATE.
- Common table expressions and recursion. Common table expressions can provide another means of avoiding multiple database calls when an application requires some complex processing. A common table expression is a temporary table that exists only for the duration of a single SQL statement. Figure 2 shows an example of a simple common table expression. You can define as many common table expressions as you like in a given SQL statement, up to the limit of table references (currently 225). Also, a common table expression can reference another common table expression that was defined earlier. This can enable some fairly complex processing in a single SQL statement, and can be used to replace many statements executed to process a batch of transactions. If you have a batch process that reads through a bunch of data, places interim results in another table, then starts a second process to read the interim results, you could benefit from doing this with common table expressions. In addition to enabling multi-step processing in a single SQL statement, a common table expression can also reference itself. This is called a “recursive” reference. DB2 recursion is useful when processing tables that are self-referencing. One example could be a table that holds nodes of a menu for a Website. Instead of running multiple statements to build the Web menu, a single recursive statement will do the trick.
- Stored procedures. Stored procedures are another key feature that can help programmers reduce the quantity of costly calls to the data server. This only works if you embed multiple SQL statements, as well as program logic, into the stored procedure. The goal is to move multiple SQL calls into the procedure so that only one call is made remotely by the application. Obviously, this is a benefit only to remote applications in most situations. Be aware that using stored procedures to encapsulate only a single SQL call will likely be more of a performance detriment than improvement.
Cluster Correctly for Batches and Joins
The design of keys, processes, and the impact of random I/O on a batch process are commonly misunderstood. Application developers may not think about the clustering of tables or input because they’re typically addressing processing by way of transactions. In an Online Transaction Processing (OLTP) environment that’s user-driven, you will typically find data access patterns that are fairly random. In these situations, the use of natural keys and clustering isn’t that critical. However, reality has shown that while data can be collected in real-time, it’s more often processed off-hours in batches. When this happens, clustering of both the tables and the input to the process becomes much more critical. Obviously, if you’re dealing with batches in this manner, then the use of natural keys as primary keys in the data definition makes common clustering easier. If you’re using system-generated keys, you can still establish a common cluster across tables if the child table inherits the primary key of the parent as part of its primary key. Likewise, any collected data that will be input to a batch process can also be stored in that common cluster sequence. Then when the batch process executes, all data is processed in the clustering sequence, which can allow the database engine to take advantage of dynamic prefetch and generate significant gains in elapsed time and CPU.
Data clustering can make a big difference in the performance of a join of two or more tables. If the tables aren’t clustered in the same sequence, then joining them together in an SQL statement, or programmatically, could result in more random I/O and longer elapsed times. By sharing a common clustering across tables, you can realize a dramatic improvement in the elapsed time of processes that access those tables simultaneously, and is effective for both OLTP and batch processing.
Partitioning and Application Parallelism