Sometimes the intent of the DBA or database designer is to partition tables similarly to encourage the use of I/O parallelism. This isn’t as common as you might think; most table joins tend to be parent-to-child and each table may have different volumes, hot spots, and clustering needs.
Similar cases are encountered in a data warehouse environment that supports so-called Kimball designs—a single fact table with time-dependent data usually partitioned by date that’s joined to several dimension tables. In these cases, the dimension tables are usually partitioned by subject area or by some key in the subject area. This lets DB2 use a star-join access path to join the dimension tables before joining to the fact table. Implementing I/O-parallelism for the dimension tables may be possible by partitioning them all in a similar fashion.
Clustering methods that take advantage of methods of high-performance data access (joins, I/O parallelism, prefetch) are usually chosen because they tend to avoid I/O and throughput constraints. However, as more data is accessed and locked, data availability issues may arise.
Sometimes, partitioning schemes create additional issues for the DBA. A case in point involves the previous example where data is inserted into partitions based on a random surrogate key. Here, the DBA has little choice but to schedule regular image copies of all partitions. Contrast this with a month-based partitioning scheme, where it may be possible to schedule frequent image copies of current data only (perhaps the most recent partition), while other partitions are copied less frequently. Of course, this assumes that old data is changed far less frequently (or not at all) compared to recent data.
This also applies to a disaster recovery environment. With a month-based partitioning scheme, the DBA can recover recent partitions first, making critical data available as fast as possible. This must be accompanied by the proper indexing scheme, perhaps using DPSIs and including index image copies.
Some final notes on partitioning:
- Partitioning schemes such as partition rotation sometimes require coordinating application and database administration efforts to ensure that image copy, reorg, and purge processes access the correct physical and logical partitions.
- Different partitioning methods support data load, old data purging, and infrastructure processes (backup, reorg, etc.) to different degrees.
- Application data access patterns, proposed data distribution, and data volatility may determine a particular partitioning scheme.
Making a change to a partitioning scheme may not be a viable option, or at least may be deemed extremely costly. Such a change usually means a complete data unload/load, rebuilding indexes, changing infrastructure support processes, and more. So, once chosen, the DBA would prefer that the partitioning scheme remain in place.
A corollary of this is that if multiple partitioning methods are viable for a particular table you should choose the one deemed most flexible. This requires the DBA to be aware of current system performance constraints to determine what manner of partitioning will yield best performance.
Finally, there are some disadvantages to partitioning. Partitioning increases the number of physical data sets that must be created, including any indexes that are also partitioned. This increase means adjusting corresponding DB2 configuration parameters to account for managing, opening, and closing a larger number of data sets. There are other factors to take into account and you should refer to the appropriate IBM DB2 manual for these.
The following is a list of best practices for database design when considering overall system performance:
- Many of the benefits of partitioning aren’t available for segmented tablespaces in DB2 V8. Universal table spaces, available in DB2 9 for z/OS, combine some of the features of both segmented and partitioned table spaces. For those on DB2 V8, consider partitioning for intermediate and large-size tables. Implementing partitioning isn’t required, but considering it is a good idea if only to ensure that data purge processes are discussed at design time. If you have migrated to DB2 9, consider universal table spaces.
- Partition based on major I/O activity (purge, mass insert, avoiding hot spots), but also consider current system resource constraints and how they may be affected. For example, in a DB2 subsystem where many batch jobs execute in a limited batch window, batch application elapsed time (i.e., throughput) is a constraint. So, you might consider partitioning to relieve potential throughput issues.
- Partitioning to minimize I/Os requires a detailed knowledge of data access patterns. In particular, purge processes are highly I/O-intensive; they usually involve lots of data access during a short period, deletion of rows and index entries, and logging of changes. Partitioning to take the purge process into account may be the most important factor in database design.
- Maximizing data availability is tightly linked to locking. While row-level locking is a potential fix for this it has several disadvantages, including increased CPU usage and greatly increasing the number of locked objects within the scope of a transaction. Reducing data availability constraints is usually accomplished by improving transaction design by either shortening transaction lengths, decreasing the amount of data locked per transaction, avoiding locks (i.e., using uncommitted read), and so forth. It may be useful to consider partitioning that increases availability (avoiding hot spots, minimizing locking) by spreading data throughout multiple partitions. However, this has the disadvantage of increasing I/Os and possibly affecting throughput.
- Throughput constraints can be relieved with good table design once the data access patterns causing the issue are understood. Delays due to mass insert processing can be alleviated with partitioning by ascending key. Delays due to purge processing may be handled with rotating partitions. Of course, each of these methods (and others) may lead to constraints on other resources.
There are partitioning methods we haven’t covered, as well as many additional considerations that are beyond the scope of this article. Nevertheless, we hope this article has given you some ideas to consider, especially in the area of good database design. Considering potential resource constraints and designing databases to mitigate or avoid them is a best practice.