DB2 & IMS

Top-10 Ways to Waste CPU Using DB2

5 Pages

Too many shops use a single set of BIND parameters for all plans and packages because it’s easy. Easy never equates to good performance. There are many bind parameters to consider for best performance in certain situations. Binding generically, or incorrectly, can hamper performance and can burn a lot of CPU for the wrong reasons.

Consider one of the most misunderstood bind parameters: RELEASE . The default is RELEASE (CO MMIT) and is often recommended for concurrency. If you specify CO MMIT, you inform DB2 that you’re not returning after commit for future work and are more concerned about availability. DB2 destroys some important things when this occurs. Using the option of DEALLOC ATE tells DB2 you’re returning for future work after commit and wish to save the IPROC s, UPROC s, SP ROC s, the cache for sequential detection and the cache for index lookaside. All these exist to improve performance of repeat processes and need information retained across commits. For example, without lookaside, the index is repeatedly reprobed, resulting in more GETP AGEs and significantly impacting performance. Regarding concurrency, remember that X and S locks (the incompatible locks ) are released at commit even with DEALLOC ATE. Only the intent locks are held past the commit.

So, you have a lot of CPU to burn. Great. Use DEGREE(ANY) everywhere! If you don’t have long-running queries that can truly benefit from CPU parallelism or the resources to support it, this isn’t a wise idea. CPU parallelism does have benefits; you have to use CPU to get the benefits of elapsed time reduction for these queries and you need to be sure there’s an adequate number of threads and virtual storage. DEGREE (ANY) creates two access paths for every statement that must be stored in the EDM (Environmental Descriptor Manager) pool cache. If using DEGREE(ANY), set reasonable max degrees in the subsystem so it’s used wisely and there’s not much degradation.

REOPT(VARS) re-determines access path at run-time. Are you using the right parameter? Does every query in the package need this? Re-optimization will look at host variable values and will perform an incremental bind (aka mini-bind). This incremental binding isn’t cheap and all statements are subject to re-optimization for every execution. Often, considerable time is spent unnecessarily in re-PREP ARES . The best practice is to put statement(s) that benefit from a re-optimization in its own package, then choose REOPT(VARS) for that package or make that statement dynamic.

4 Blind updates

Do you know exactly what changed during an update? It costs CPU to find out. Often, the updates are to just a few columns, or quite often, a single column. Maybe you should consider splitting the table based on columns that are updated. A high update table could consist of columns being updated and that data clustered in update sequence. Then the low update table would have the columns rarely updated. The result is much faster updates. This requires intimate knowledge of both data and process, but it’s worth it. Sometimes, no columns have changed at all, but we still blindly perform updates. While DB2 doesn’t actually update data a row if nothing changed, there’s still the work to find this out. There also are other methods outside DB2 for detecting changes, such as creating and comparing Cyclical Redundancy Checks (CRCs) or hash values. If you cannot control change detection, then perhaps detecting the change before you issue the update is a way to save significant resources. Figure 5 shows a change detection process and a subsequent update to separate tables that saves significant CPU.

3 Random distribution and access

If randomly distributed data doesn’t match your process, there’ll be excessive overhead. Also, random I/O gets expensive and the process of randomly inserting and storing data can be much more expensive than sequential inserts. There are more random reads necessary for insert location, and row relocation is more of a possibility. This can lead to unbalanced partitions and the need for more frequent REORGs. Random distribution of data also makes sizing free space more difficult.

Often, random keys/distribution occurs for less-than-good reasons. Often, this is attributable to the unnecessary use of surrogate keys (non-natural), which may result from an Object-Oriented (OO) design or, with legacy migrations, the notion of “parallelizing.” In other words, spread out the work to get all disks spinning at once. With the advent of high-performance DASD subsystems, large DASD cache, parallel access volumes, and large bufferpools, this idea is seriously outdated. If your application input is naturally parallelized, then parallelizing your data may not be a bad idea. However, accepting a batch of input data, and parallelizing to get better DB2 performance these days will generate just the opposite.

5 Pages