Aug 16 ’10
How Database Design Affects DB2 for z/OS System Performance
A previous article (“DB2 UDB for z/OS Version 8: Performance Strategies,” z/Journal April/May 2006) described one common way DB2 DBAs and systems professionals tune DB2 for z/OS on a subsystem level. We called this resource constraint analysis because it’s based on an analysis of what resources are currently constrained and how that can be mitigated. This led to a DB2 systems tuning strategy to assist the DBA and systems programmer in developing the basic instincts necessary for supporting the IT enterprise.
This article will examine the other side of the equation. How can we design databases so they may be accessed in an efficient manner? Are there designs that permit the DBA more flexibility when tuning is being done?
We will pay specific attention to database design schemes and how they affect the way the DBA or systems professional tunes DB2 for z/OS subsystems. A follow-up article will take a look at another area: how application design affects DB2 system performance.
Resource Constraint Tuning
Consider your DB2 system (or data-sharing group) along with your applications as a single entity that consumes resources. Applications (including the DB2 subsystem code) use various combinations of resources during execution. These resources may include:
- CPU cycles
- Central storage (memory)
- Disk storage (I/O)
- Network (message travel time)
- Object access (locking, thread wait, enqueues)
- Application throughput (elapsed times, transactions per unit time).
A resource-based approach to DB2 system tuning concentrates on two aspects of the system:
- Resources as bottlenecks or constraints
- The way resource usage affects key system characteristics, including recoverability, performance, availability, security, and features and functions.
A resource-based approach uses the following procedure:
- Identify resource constraints
- Identify coincident resource availability
- Note critical system characteristics and issues
- Analyze 1, 2, and 3, looking for trade-offs.
By trade-offs we mean you’ll be looking to reduce or eliminate a resource constraint by re-directing the application to use a different resource. For example, you determine that you have a CPU constraint during your nightly batch window. Upon further analysis you note that many applications access a large partitioned table space using CPU parallelism. You can reduce the CPU constraint by inhibiting parallelism for those applications, although the applications may run longer.
Resource constraint analysis can occur at the object, application, or system levels. For example:
- Analyze critical tables and indexes. Consider them to be constraints. Develop ways of using excess CPU, DASD, or other resources to reduce contention.
- Analyze critical applications. Consider them to be constraints. Develop methods of using excess resources to increase application throughput.
- Analyze the DB2 subsystem. Determine the resource bottlenecks. Develop techniques for using excess resources to relieve the constraints.
The Impact of Database Design
Database design will usually affect I/O, object availability, and application throughput. Figure 1 lists these resource constraints, with common root issues and the typical fixes or workarounds implemented.
In Figure 1, notice the presence of partitioning in the typical fixes column. Partitioning schemes greatly influence the options a DBA has for tuning. Some schemes may prevent certain tuning tactics, while others are more accommodating.
The way a table and its indexes are partitioned affects more than just resource constraints. Partitioning usually determines backup, reorg, runstats frequency, and data purge strategy; it can also be used as a performance tool to distribute activity across volatile pagesets to avoid hot spots.
What specific measures can we take during database design or application tuning to minimize I/O, availability, and throughput constraints? The biggest issue will be data distribution and how activity is distributed across the pagesets. There are many different models and examples of this; we’ll consider only a few, but the principles are valid across most models.
Insert Hot Spots
Sometimes activity concentrated at a single point is a good thing. Consider a table with a sequentially ascending clustering key that experiences frequent row inserts. All new rows can be added at or near the physical end of the table space without interfering with each other since DB2 prefers to ensure that a row insert doesn’t wait (or timeout) rather than insist on placing the row on the “best” page. If most table activity is these inserts, then concentrating activity at one point may be acceptable.
When we have a high insert rate of rows that are evenly distributed across a table we may encounter resource problems. If every row is inserted on a page different from the last, then each row requires a getpage and a lock (usually a page lock). As the number of getpages and locks per transaction increases, the number of physical I/Os tends to increase. In addition, more pages are locked. This can lead to excessive synchronous I/Os, excessive page locking, long transaction execution times, and increased chance of deadlock or timeout. This can be somewhat annoying in an online environment, as the victim of the timeout or deadlock is rolled back by DB2, requiring either failing the transaction or re-executing it.
Several database designs address various aspects of this situation. The most common is a “rotating” partitioning scheme that focuses new row inserts in the last logical partition coupled with purge logic to remove or archive old data from earlier partitions. At some point, you can implement a partition rotation process using SQL similar to the following:
ALTER TABLE <table-name> ROTATE PARTITION
This assumes implementation of table-based partitioning that provides for reuse of purged partitions. Such a scheme must be organized and coordinated with backup and recovery processes. Rotation of partitions will result in the physical partition numbers (corresponding to the physical data sets underlying the design) to no longer match the logical partition numbers. This is detailed in the IBM manuals.
Another database design relates to insert hot spots in transactional tables that are frequently referenced. Here, you want to avoid insert hot spots by spreading new rows across the table space or partition rather than clustering them at the end of the pageset, while still maintaining a partitioning scheme that allows for data purging based on age.
You can accomplish this using table-based partitioning, where the partitioning key is a surrogate key whose value is randomly determined at the time of insert. Another option is using an insert trigger. The effect is to spread row inserts evenly across partitions. Try to ensure that data access paths aren’t affected by this scheme. One possible danger involves Data Partitioned Secondary Indexes (DPSIs), which SQL uses to access the table. Without including a predicate specifying the partition key, SQL may cause DB2 to scan each partition of the DPSI to find qualifying rows. Because of this issue the DBA usually implements the aforementioned scheme without DPSIs.
Input Hot Spots
Schemes that distribute rows to avoid hot spots tend to increase synchronous I/Os while decreasing the ability of DB2 to perform prefetch operations. Data availability may be a problem if transactions (commit-to-commit) are long since many pages will be locked. You should design transactions to have relatively short units of recovery. Such short transactions will have a higher proportion of commits over time than longer transactions, so the amount of work done at commit (such as externalizing log records) may affect system performance.
One additional advantage of relatively short transactions is that they hold claims on pagesets for shorter periods, allowing utilities to drain these claims and to process without failing.
Partitioning where rows are inserted at end-of-partition will be friendlier to SQL statements accessing blocks of rows with similar key values. Data availability improves as such rows physically inserted on the same page will limit total pages locked. Throughput is also relatively fast because inserts won’t compete with each other.
Data Purge Issues
You can implement a rotating partition scheme to purge or archive old data. Assuming you have no need to archive data, a partitioning scheme based on the purge criteria (usually date) works well. In our example where we partitioned a table by month, we can easily empty the oldest partition using load utility statements, such as the following, having no input data:
LOAD REPLACE ... PART <partition number>
Some care must be taken since the load utility must know the physical partition number. Since after partition rotation the physical and logical partitions are no longer synchronized, the DBA must implement some process to generate the load statement with the correct partition number. This can be done with programming or by using a language such as REXX.
Note that these schemes also require coordination with other DBA processes such as reorg and image copy since the physical partitions requiring backup (and potential recovery) change over time. Again, you can use a programmatic process to generate correct copy and reorg statements or Real-Time Statistics (RTS) to automate these jobs.
For schemes where the purge criteria can’t be conveniently matched with the partitioning scheme, purging must occur using an application. Here, to avoid resource constraints, you must design the purge application to peacefully coexist with other concurrent table activity. For volatile tables that may be accessed at any time of day the purge application must be coded as restartable, since it may encounter a deadlock or timeout. In addition, you should be able to tune the commit frequency easily so the DBA can adjust it to minimize any timeouts and deadlocks other applications experience. This is commonly done by having commit frequency information placed in a common area (such as a DB2 table) that can be modified when needed, avoiding program logic changes.
What happens if the purge process executes more slowly than rows are added to the table? The usual answer in this case is to partition the table based on other criteria and then execute multiple, simultaneous instances of the purge process. To avoid deadlocks, each instance is designed to purge from a different partition. Again, if a partition rotation scheme is in place, the partition choice must be programmed in some fashion.
Partitioning schemes that favor quick purge (via emptying partitions with old data) tend to lessen the I/O load by limiting transaction activity to partitions with current data. However, by concentrating data access to a small portion of the table you must beware of causing availability constraints since row access activity is now confined to a relatively small part of the table. Application throughput is excellent for inserting applications while queries may possibly deadlock or timeout due to page contention. This can be alleviated in several ways, each with advantages and disadvantages. Options include querying with an isolation level of uncommitted read, implementing row-level locking, or shortening transaction length via more frequent commits.
There are three common reasons for table clustering:
- To support high-performance, multi-row access (such as I/O parallelism)
- To support distribution of volatile pages to avoid hot spots
- To allow some joins to work faster.
In each case, rows with “nearby” keys are expected to be accessed in the same transaction or in the same SQL statement. Increasing their physical proximity tends to decrease table getpages and avoid random table I/Os. However, this may not always be the case.
Another possible benefit of clustering is when many (hundreds or more) rows with nearby keys are accessed. Clustering may favor either sequential prefetch or dynamic prefetch of pages, reducing I/O wait times and improving application throughput.
Some partitioning schemes will clash with clustering needs. For example, partitioning schemes that distribute row inserts to avoid hot spots may prevent clustering in the preferred manner.
Sometimes the intent of the DBA or database designer is to partition tables similarly to encourage the use of I/O parallelism. This isn’t as common as you might think; most table joins tend to be parent-to-child and each table may have different volumes, hot spots, and clustering needs.
Similar cases are encountered in a data warehouse environment that supports so-called Kimball designs—a single fact table with time-dependent data usually partitioned by date that’s joined to several dimension tables. In these cases, the dimension tables are usually partitioned by subject area or by some key in the subject area. This lets DB2 use a star-join access path to join the dimension tables before joining to the fact table. Implementing I/O-parallelism for the dimension tables may be possible by partitioning them all in a similar fashion.
Clustering methods that take advantage of methods of high-performance data access (joins, I/O parallelism, prefetch) are usually chosen because they tend to avoid I/O and throughput constraints. However, as more data is accessed and locked, data availability issues may arise.
Sometimes, partitioning schemes create additional issues for the DBA. A case in point involves the previous example where data is inserted into partitions based on a random surrogate key. Here, the DBA has little choice but to schedule regular image copies of all partitions. Contrast this with a month-based partitioning scheme, where it may be possible to schedule frequent image copies of current data only (perhaps the most recent partition), while other partitions are copied less frequently. Of course, this assumes that old data is changed far less frequently (or not at all) compared to recent data.
This also applies to a disaster recovery environment. With a month-based partitioning scheme, the DBA can recover recent partitions first, making critical data available as fast as possible. This must be accompanied by the proper indexing scheme, perhaps using DPSIs and including index image copies.
Some final notes on partitioning:
- Partitioning schemes such as partition rotation sometimes require coordinating application and database administration efforts to ensure that image copy, reorg, and purge processes access the correct physical and logical partitions.
- Different partitioning methods support data load, old data purging, and infrastructure processes (backup, reorg, etc.) to different degrees.
- Application data access patterns, proposed data distribution, and data volatility may determine a particular partitioning scheme.
Making a change to a partitioning scheme may not be a viable option, or at least may be deemed extremely costly. Such a change usually means a complete data unload/load, rebuilding indexes, changing infrastructure support processes, and more. So, once chosen, the DBA would prefer that the partitioning scheme remain in place.
A corollary of this is that if multiple partitioning methods are viable for a particular table you should choose the one deemed most flexible. This requires the DBA to be aware of current system performance constraints to determine what manner of partitioning will yield best performance.
Finally, there are some disadvantages to partitioning. Partitioning increases the number of physical data sets that must be created, including any indexes that are also partitioned. This increase means adjusting corresponding DB2 configuration parameters to account for managing, opening, and closing a larger number of data sets. There are other factors to take into account and you should refer to the appropriate IBM DB2 manual for these.
The following is a list of best practices for database design when considering overall system performance:
- Many of the benefits of partitioning aren’t available for segmented tablespaces in DB2 V8. Universal table spaces, available in DB2 9 for z/OS, combine some of the features of both segmented and partitioned table spaces. For those on DB2 V8, consider partitioning for intermediate and large-size tables. Implementing partitioning isn’t required, but considering it is a good idea if only to ensure that data purge processes are discussed at design time. If you have migrated to DB2 9, consider universal table spaces.
- Partition based on major I/O activity (purge, mass insert, avoiding hot spots), but also consider current system resource constraints and how they may be affected. For example, in a DB2 subsystem where many batch jobs execute in a limited batch window, batch application elapsed time (i.e., throughput) is a constraint. So, you might consider partitioning to relieve potential throughput issues.
- Partitioning to minimize I/Os requires a detailed knowledge of data access patterns. In particular, purge processes are highly I/O-intensive; they usually involve lots of data access during a short period, deletion of rows and index entries, and logging of changes. Partitioning to take the purge process into account may be the most important factor in database design.
- Maximizing data availability is tightly linked to locking. While row-level locking is a potential fix for this it has several disadvantages, including increased CPU usage and greatly increasing the number of locked objects within the scope of a transaction. Reducing data availability constraints is usually accomplished by improving transaction design by either shortening transaction lengths, decreasing the amount of data locked per transaction, avoiding locks (i.e., using uncommitted read), and so forth. It may be useful to consider partitioning that increases availability (avoiding hot spots, minimizing locking) by spreading data throughout multiple partitions. However, this has the disadvantage of increasing I/Os and possibly affecting throughput.
- Throughput constraints can be relieved with good table design once the data access patterns causing the issue are understood. Delays due to mass insert processing can be alleviated with partitioning by ascending key. Delays due to purge processing may be handled with rotating partitions. Of course, each of these methods (and others) may lead to constraints on other resources.
There are partitioning methods we haven’t covered, as well as many additional considerations that are beyond the scope of this article. Nevertheless, we hope this article has given you some ideas to consider, especially in the area of good database design. Considering potential resource constraints and designing databases to mitigate or avoid them is a best practice.