Aug 16 ’10

How Database Design Affects DB2 for z/OS System Performance

by Editor in z/Journal

A previous article (“DB2 UDB for z/OS Version 8: Performance Strategies,” z/Journal April/May 2006) described one common way DB2 DBAs and systems professionals tune DB2 for z/OS on a subsystem level. We called this resource constraint analysis because it’s based on an analysis of what resources are currently constrained and how that can be mitigated. This led to a DB2 systems tuning strategy to assist the DBA and systems programmer in developing the basic instincts necessary for supporting the IT enterprise.

This article will examine the other side of the equation. How can we design databases so they may be accessed in an efficient manner? Are there designs that permit the DBA more flexibility when tuning is being done?

We will pay specific attention to database design schemes and how they affect the way the DBA or systems professional tunes DB2 for z/OS subsystems. A follow-up article will take a look at another area: how application design affects DB2 system performance.

Resource Constraint Tuning 

Consider your DB2 system (or data-sharing group) along with your applications as a single entity that consumes resources. Applications (including the DB2 subsystem code) use various combinations of resources during execution. These resources may include:

A resource-based approach to DB2 system tuning concentrates on two aspects of the system:

A resource-based approach uses the following procedure:

  1. Identify resource constraints
  2. Identify coincident resource availability
  3. Note critical system characteristics and issues
  4. Analyze 1, 2, and 3, looking for trade-offs.

By trade-offs we mean you’ll be looking to reduce or eliminate a resource constraint by re-directing the application to use a different resource. For example, you determine that you have a CPU constraint during your nightly batch window. Upon further analysis you note that many applications access a large partitioned table space using CPU parallelism. You can reduce the CPU constraint by inhibiting parallelism for those applications, although the applications may run longer.

Resource constraint analysis can occur at the object, application, or system levels. For example:

The Impact of Database Design

Database design will usually affect I/O, object availability, and application throughput. Figure 1 lists these resource constraints, with common root issues and the typical fixes or workarounds implemented.

In Figure 1, notice the presence of partitioning in the typical fixes column. Partitioning schemes greatly influence the options a DBA has for tuning. Some schemes may prevent certain tuning tactics, while others are more accommodating.

The way a table and its indexes are partitioned affects more than just resource constraints. Partitioning usually determines backup, reorg, runstats frequency, and data purge strategy; it can also be used as a performance tool to distribute activity across volatile pagesets to avoid hot spots.

What specific measures can we take during database design or application tuning to minimize I/O, availability, and throughput constraints? The biggest issue will be data distribution and how activity is distributed across the pagesets. There are many different models and examples of this; we’ll consider only a few, but the principles are valid across most models.

Insert Hot Spots

Sometimes activity concentrated at a single point is a good thing. Consider a table with a sequentially ascending clustering key that experiences frequent row inserts. All new rows can be added at or near the physical end of the table space without interfering with each other since DB2 prefers to ensure that a row insert doesn’t wait (or timeout) rather than insist on placing the row on the “best” page. If most table activity is these inserts, then concentrating activity at one point may be acceptable.

When we have a high insert rate of rows that are evenly distributed across a table we may encounter resource problems. If every row is inserted on a page different from the last, then each row requires a getpage and a lock (usually a page lock). As the number of getpages and locks per transaction increases, the number of physical I/Os tends to increase. In addition, more pages are locked. This can lead to excessive synchronous I/Os, excessive page locking, long transaction execution times, and increased chance of deadlock or timeout. This can be somewhat annoying in an online environment, as the victim of the timeout or deadlock is rolled back by DB2, requiring either failing the transaction or re-executing it.

Several database designs address various aspects of this situation. The most common is a  “rotating” partitioning scheme that focuses new row inserts in the last logical partition coupled with purge logic to remove or archive old data from earlier partitions. At some point, you can implement a partition rotation process using SQL similar to the following:

ALTER TABLE <table-name> ROTATE PARTITION

This assumes implementation of table-based partitioning that provides for reuse of purged partitions. Such a scheme must be organized and coordinated with backup and recovery processes. Rotation of partitions will result in the physical partition numbers (corresponding to the physical data sets underlying the design) to no longer match the logical partition numbers. This is detailed in the IBM manuals.

Another database design relates to insert hot spots in transactional tables that are frequently referenced. Here, you want to avoid insert hot spots by spreading new rows across the table space or partition rather than clustering them at the end of the pageset, while still maintaining a partitioning scheme that allows for data purging based on age.

You can accomplish this using table-based partitioning, where the partitioning key is a surrogate key whose value is randomly determined at the time of insert.  Another option is using an insert trigger. The effect is to spread row inserts evenly across partitions. Try to ensure that data access paths aren’t affected by this scheme. One possible danger involves Data Partitioned Secondary Indexes (DPSIs), which SQL uses to access the table. Without including a predicate specifying the partition key, SQL may cause DB2 to scan each partition of the DPSI to find qualifying rows. Because of this issue the DBA usually implements the aforementioned scheme without DPSIs.

Input Hot Spots

Schemes that distribute rows to avoid hot spots tend to increase synchronous I/Os while decreasing the ability of DB2 to perform prefetch operations. Data availability may be a problem if transactions (commit-to-commit) are long since many pages will be locked. You should design transactions to have relatively short units of recovery. Such short transactions will have a higher proportion of commits over time than longer transactions, so the amount of work done at commit (such as externalizing log records) may affect system performance.

One additional advantage of relatively short transactions is that they hold claims on pagesets for shorter periods, allowing utilities to drain these claims and to process without failing.

Partitioning where rows are inserted at end-of-partition will be friendlier to SQL statements accessing blocks of rows with similar key values. Data availability improves as such rows physically inserted on the same page will limit total pages locked. Throughput is also relatively fast because inserts won’t compete with each other.

Data Purge Issues

You can implement a rotating partition scheme to purge or archive old data. Assuming you have no need to archive data, a partitioning scheme based on the purge criteria (usually date) works well. In our example where we partitioned a table by month, we can easily empty the oldest partition using load utility statements, such as the following, having no input data:

LOAD REPLACE ... PART  <partition number>

Some care must be taken since the load utility must know the physical partition number. Since after partition rotation the physical and logical partitions are no longer synchronized, the DBA must implement some process to generate the load statement with the correct partition number. This can be done with programming or by using a language such as REXX.

Note that these schemes also require coordination with other DBA processes such as reorg and image copy since the physical partitions requiring backup (and potential recovery) change over time. Again, you can use a programmatic process to generate correct copy and reorg statements or Real-Time Statistics (RTS) to automate these jobs.

For schemes where the purge criteria can’t be conveniently matched with the partitioning scheme, purging must occur using an application. Here, to avoid resource constraints, you must design the purge application to peacefully coexist with other concurrent table activity. For volatile tables that may be accessed at any time of day the purge application must be coded as restartable, since it may encounter a deadlock or timeout. In addition, you should be able to tune the commit frequency easily so the DBA can adjust it to minimize any timeouts and deadlocks other applications experience. This is commonly done by having commit frequency information placed in a common area (such as a DB2 table) that can be modified when needed, avoiding program logic changes.

What happens if the purge process executes more slowly than rows are added to the table? The usual answer in this case is to partition the table based on other criteria and then execute multiple, simultaneous instances of the purge process. To avoid deadlocks, each instance is designed to purge from a different partition. Again, if a partition rotation scheme is in place, the partition choice must be programmed in some fashion.

Partitioning schemes that favor quick purge (via emptying partitions with old data) tend to lessen the I/O load by limiting transaction activity to partitions with current data. However, by concentrating data access to a small portion of the table you must beware of causing availability constraints since row access activity is now confined to a relatively small part of the table. Application throughput is excellent for inserting applications while queries may possibly deadlock or timeout due to page contention. This can be alleviated in several ways, each with advantages and disadvantages. Options include querying with an isolation level of uncommitted read, implementing row-level locking, or shortening transaction length via more frequent commits.

Clustering

There are three common reasons for table clustering:

In each case, rows with “nearby” keys are expected to be accessed in the same transaction or in the same SQL statement. Increasing their physical proximity tends to decrease table getpages and avoid random table I/Os. However, this may not always be the case.

Another possible benefit of clustering is when many (hundreds or more) rows with nearby keys are accessed. Clustering may favor either sequential prefetch or dynamic prefetch of pages, reducing I/O wait times and improving application throughput.

Some partitioning schemes will clash with clustering needs. For example, partitioning schemes that distribute row inserts to avoid hot spots may prevent clustering in the preferred manner.

Sometimes the intent of the DBA or database designer is to partition tables similarly to encourage the use of I/O parallelism. This isn’t as common as you might think; most table joins tend to be parent-to-child and each table may have different volumes, hot spots, and clustering needs.

Similar cases are encountered in a data warehouse environment that supports so-called Kimball designs—a single fact table with time-dependent data usually partitioned by date that’s joined to several dimension tables. In these cases, the dimension tables are usually partitioned by subject area or by some key in the subject area. This lets DB2 use a star-join access path to join the dimension tables before joining to the fact table. Implementing I/O-parallelism for the dimension tables may be possible by partitioning them all in a similar fashion.

Clustering methods that take advantage of methods of high-performance data access (joins, I/O parallelism, prefetch) are usually chosen because they tend to avoid I/O and throughput constraints. However, as more data is accessed and locked, data availability issues may arise.

Recovery Issues

Sometimes, partitioning schemes create additional issues for the DBA. A case in point involves the previous example where data is inserted into partitions based on a random surrogate key. Here, the DBA has little choice but to schedule regular image copies of all partitions. Contrast this with a month-based partitioning scheme, where it may be possible to schedule frequent image copies of current data only (perhaps the most recent partition), while other partitions are copied less frequently. Of course, this assumes that old data is changed far less frequently (or not at all) compared to recent data.

This also applies to a disaster recovery environment. With a month-based partitioning scheme, the DBA can recover recent partitions first, making critical data available as fast as possible. This must be accompanied by the proper indexing scheme, perhaps using DPSIs and including index image copies.

Best Practices

Some final notes on partitioning:

Making a change to a partitioning scheme may not be a viable option, or at least may be deemed extremely costly. Such a change usually means a complete data unload/load, rebuilding indexes, changing infrastructure support processes, and more. So, once chosen, the DBA would prefer that the partitioning scheme remain in place.

A corollary of this is that if multiple partitioning methods are viable for a particular table you should choose the one deemed most flexible. This requires the DBA to be aware of current system performance constraints to determine what manner of partitioning will yield best performance.

Finally, there are some disadvantages to partitioning. Partitioning increases the number of physical data sets that must be created, including any indexes that are also partitioned. This increase means adjusting corresponding DB2 configuration parameters to account for managing, opening, and closing a larger number of data sets. There are other factors to take into account and you should refer to the appropriate IBM DB2 manual for these.

The following is a list of best practices for database design when considering overall system performance:

There are partitioning methods we haven’t covered, as well as many additional considerations that are beyond the scope of this article. Nevertheless, we hope this article has given you some ideas to consider, especially in the area of good database design. Considering potential resource constraints and designing databases to mitigate or avoid them is a best practice.