The DBA must coordinate such key clustering techniques with index choice because physical clustering of the table is dependent on designating one of the indexes as the clustering index. Since adding a surrogate key to a table to cluster it necessarily means creating a clustering index for that key, this may affect the total number of indexes for the table; it could increase recovery time or decrease SQL insert performance.
The last issue confronting the DBA in the area of data availability is coordinating SQL access and transaction commits with utility execution, especially the reorg utility. Reorg with the sharelevel change option allows concurrent execution with SQL insert, update, and delete statements. However, at the end of reorg execution, the utility enters the switch phase. In this phase, the utility attempts to drain all table partitions involved in the reorg. (A drain is the act of acquiring a locked resource by quiescing access to that object.) Since concurrent SQL statements acquire locks on tables, the reorg utility needs these locks released to complete the switch phase. If the utility doesn’t acquire the drain in a certain amount of time, it fails.
The result is that long-running transactions or transactions that delay commits can negatively impact reorg utility execution. The DBA may need to take this into account during database design. Apart from ensuring that long-running transactions don’t occur, the most common method of mitigating this issue is implementing a sufficient number of horizontal partitions so each partition that’s reorged has only a small footprint. Another alternative to avoid this type of contention is implementing third-party tools or utilities that offer more efficient switching options.
Excessive horizontal partitioning may affect recovery processes and also increase the number of open data sets. This may increase the amount of logical and physical data set closings or openings DB2 does to avoid hitting the maximum number of data sets that may be open (specified by the system parameter DSNUM).
Design for Security
Security concerns are divided into three levels: prevention, detection, and correction. We usually prefer to prevent security breaches; if this can’t be done thoroughly, then we require methods of detecting and eventually correcting them.
DB2 security mechanisms, such as authorization-ids and the ability to grant read and write authorities at the table or column level, are usually sufficient to limit application access to authorized users. Therefore, prevention and correction of unauthorized access is best handled by external security packages or a security administrator.
Database designs involving security considerations typically address the second level—detection—by implementing a change history accumulation process. The DBA may include table columns for audit trail purposes such as LastUpdateDate and UpdateByUser. Another common design might include an audit history table, which would contain rows of before and after images of changed data.
More elaborate schemes of gathering change history include:
- Message queues with either transactional data, changes, or both
- Special mirror tables defined similarly to base tables that would contain before images of data
- Third-party vendor software with the ability to scan DB2 transaction logs and gather who-changed-what information.
Vendor software provides another advantage, as these products are more difficult to circumvent than most database design strategies. If your environment has strict compliance requirements or is subject to a certification or governing body, then standard vendor solutions may provide the best answer.
Design for Performance
Many database design alternatives that enhance application performance are well-known and have appeared in previous articles. The most common arise when dealing with resource constraints; the DBA documents current constraints, analyzes possible trade-offs, and makes design decisions that take the constraints into account while ensuring compliance with the first three laws (recoverability, data availability, and security).
For example, assume that I/O throughput and application elapsed time are constrained. A possible design change to alleviate the constraints is to add indexes that support search predicates and join predicates, perhaps even providing index-only access. The DBA would first ensure that adding new indexes wouldn’t compromise data recoverability (additional indexes add to recovery time) or negatively affect data availability (the new indexes may create hot spots).
Summary and Best Practices
The following is a list of best practices for database design when considering application requirements:
- Since recoverability is the first and highest priority, you should include recovery review during database design. Don’t consider backup processes; the recovery requirements will dictate these. Instead, concentrate on the disaster recovery requirements of the application and data, potential recovery times, and integration with the enterprise disaster recovery plan. Most of the database design choices (partitioning, indexes, keying) involve changes to data volumes, data set sizes, number of data sets, and recovery methods.
- Partitioning is usually based on major I/O activity (purge, mass insert, avoiding hot spots). For example, in a DB2 subsystem where many batch jobs execute in a limited batch window, batch application elapsed time (i.e., throughput) is a constraint, and the DBA will consider partitioning to relieve potential throughput issues. Meanwhile, the DBA must balance recoverability and data availability.
- Key clustering techniques may be new to you. If so, ensure that any technique you develop or use is thoroughly tested, documented, reviewed with all database designers and administrators, and approved. Develop multiple ways your technique can be automatically included in object documentation. For example, if you define a surrogate key whose sole purpose is for random clustering, add a note in either the table or column remarks (using the SQL COMMENT statement). This note should supplement documentation of the key’s use in your standard data dictionary or other metadata repository.
- Maximizing data availability is tightly linked to locking. Reducing data availability constraints is usually accomplished by improving transaction design by either shortening transaction lengths, decreasing the amount of data locked per transaction, avoiding locks (i.e., using uncommitted read), and so forth. It may be useful to consider partitioning that increases availability (avoiding hot spots, minimizing locking) by spreading data throughout multiple partitions. This has the disadvantage of increasing I/Os and possibly affecting throughput, however.
DBAs and data modelers traditionally considered performance issues when developing database designs. They were influenced by several factors, including limited access path choices, few high-performance hardware or software options, and unsophisticated SQL optimizers. Today, hardware, software, and database management systems have matured; databases can potentially grow to multi-terabyte size, and enterprises are implementing critical applications whose recovery and data availability now assume much more importance. Database designers must consider these factors. If they don’t, they risk implementing databases that are unrecoverable or that require expensive and time-consuming redesign.