There are many situations in which an application must deal with what we call “blind updates,” which occur when an application receives a set of data to be updated in the database, but doesn’t know which of the elements included in the set of data are actually true updates. To be specific, a record can be received as input to the application, but most of the data in the record is already recorded in the database. In some situations, the difference between fresh and stale elements can be extreme (such as a status code change from “order entered” to “order filled”). While DB2 does a good job of detecting when columns are actually changing, it may still be prudent to reduce the number of update statements sent to the database, especially for high-volume applications. It takes CPU and I/O to run these statements, and if we can avoid them for performance reasons, we should.
Being able to detect change in these situations can result in significant CPU and elapsed time savings, especially when the ratio between columns and rows that actually change is low compared to the columns and rows that don’t change. In one particular example, the application was to receive tens of millions of blind updates daily. The data was analyzed to determine which data always changed and which data rarely changed. It was determined that 80 percent of the time, only a handful of columns actually regularly changed values. The application was designed based on this concept to always blindly apply the column updates that were expected to change, and to use a change detection process to apply the changes to the other columns only when a change was detected.
In this case, we employed a Cyclical Redundancy Check (CRC) algorithm to quickly determine when any of the rarely updated columns were changed. The rarely changing columns were hashed via a CRC algorithm and compared to a stored CRC value on the database. This allowed the application to avoid updates to hundreds of columns that rarely changed unless a CRC detected at least one mismatch. This was significant for the millions of blind updates received daily, since applying only the regular changes cost 1.8ms elapsed and 500ms CPU, while updating all the columns cost 400ms elapsed and 25ms CPU. Figure 5 demonstrates this process.
In another situation, we used triggers to both replicate changes to data and to avoid blind updates for that replication. Here, the application requirement was to immediately audit changes to data. Audit tables were established for each base table in the application to record the before images of changed data when a change occurred. This process was accomplished using after update triggers that simply inserted the before image when an update occurred. This worked well except that the application performed blind updates. This resulted in many update statements that updated all columns, including the update timestamp column, when nothing for the given row updated actually changed.
Triggers will still replicate before images to the audit tables when no change occurs. These unwanted changes had to be avoided. This was accomplished with the addition of before update triggers. These triggers compared all the columns of the new row to the columns of the old row, with the exception of the key and update timestamp. If there were no actual changes to the data, an SQLSTATE was signaled back to the application and the update rejected. Figure 6 shows the before and after update triggers.
Inserts and Clustering
There’s another less talked about, but significantly important, advantage to minimizing change. If we can minimize changes to our tables, then we can minimize the table maintenance required. If we have tables that handle high volumes of inserts, we can minimize the impact of those inserts with an “insert at the end” strategy. If we construct our table with an ever ascending key, or by using the MEMBER CLUSTER option with FREEPAGE 0 and PCTFREE 0, we can guarantee new inserts will go to the end of our tablespace data sets. By focusing the inserts at the end of the tablespace, or tablespace partition, we can minimize the data that needs to be REORGed or backed up. Remember that REORGs and copies consume system resources. These costs are sometimes not estimated as application cost, but should be.
Figure 7 shows a tablespace design that accepts inserts at the end. This partitioned tablespace grows from the end, so only the most recent partition needs tablespace maintenance (REORGs and copies). This minimization of changes results in dramatically fewer REORGS and copies than the equivalent design clustered by a non-ascending key or by not using the MEMBER CLUSTER option.
Everything operating in DB2 costs CPU resources. By design, we often don’t have a clean way to minimize the amount of changes we make and the amount of maintenance we’ll have to perform because of those changes. In today’s environment, we must take the amount of change into account when we’re designing our tables because this is where it has to begin. It’s also impossible to achieve 24x7 availability if we don’t minimize changes and maintenance from the beginning. With some understanding of what’s going to be changing and how to minimize the impact of those changes, we can achieve higher throughput and availability.