Nov 1 ’06
Top-10 Ways to Waste CPU Using DB2
During the past several years, we’ve worked with many companies to design high-performance databases and applications, and have done extensive testing and research on ways to minimize I/O and CPU. Many of those applications and databases are faster than we ever could imagine; however, those are the exception. Too often, shops have critically CPU-constrained environments. Most have one thing in common: They’re wasting CPU. There are some common ways to do this and we’ll examine how you can safely and effectively reduce CPU usage in a DB2 z/OS environment.
Of course, every environment is different and processing needs will vary. However, there are ways to reduce CPU usage through some practical performance tuning efforts and better design efforts early in development. Here are the top-10 ways, ordered in terms of CPU consumption impact, that customers waste enormous amounts of CPU unnecessarily (and unknowingly):
- Overuse of DB2 compression
- Excessive locking
- Processing of data in an SQL statement
- Overuse of DISTINCT
- Inefficient predicates
- Inappropriate BIND parameters
- Blind updates
- Random distribution/access
- Too many indexes
- Blackbox I/O modules.
We’ll look at each issue and discuss in detail how it wastes CPU, and present ways you can fix the problems.
10 Overuse of DB2 compression
DB2 compression often is used everywhere without any verification there’s a benefit and not a detriment. Often, shops have a standard to compress everything. Benefits of DB2 compression include:
- More rows per page
- More data in memory, less I/O
- Less time needed for image copies
- Less DASD for storage
- Benefits sequential and dynamic prefetch.
However, we also must consider the negatives. With more rows per page, we may cause more contention because a single page lock will now be locking more rows at once. Also, some data (short rows, binary data, encrypted data, and few repeating strings) doesn’t compress well. Long rows may not be the best candidates, either. If you have a 4K page size and a row around 2,500 bytes, you’re getting only one row per page to begin with. If you compress, you still may have only one row on the page. Consider moving to a large page size and then compressing.
When a row is compressed, it’s considered varying length. This can lead to row relocations and will cause increases in the NEARINDREF and FARINDREF numbers. This means rows are being relocated to a different page during the updates, resulting in more I/O to get the same data, and more frequent REORGs. The varying length row also may result in more logging for updates, as DB2 will log from the first byte changed to the end of the row for a varying length row that changes in length as a result of the update.
The “pros” are definitely worth the effort to use DB2 compression and can have significant performance benefits and space savings. We must ensure we’re getting those benefits by monitoring the PAGESAVE and PCTROWCOMP columns in the catalog and not introducing contention or relocated rows. Here are some additional considerations and suggestions for DB2 compression:
- Version 8 (V8) indexes aren’t compressed and that must be considered in capacity planning efforts.
- Avoid multi-table tablespaces because the compression dictionary patterns are different and the overall compression may not be optimal.
- A REORG will build a better dictionary than LOAD.
- Data compression supported on older non-IBM hardware may not provide the most optimal benefits and can be expensive in terms of CPU. T he most important consideration is to not compress small rows. This doesn’t provide any benefit; it costs CP U and introduces additional contention problems.
9 Excessive locking
Locking isn’t free. The CPU cost can add up, not to mention hurt availability. As of V8, a lock is approximately 400 CP U instructions and 540 bytes of memory. Do your daily reports contain statistics such as those in Figure 1?
Common problems include excessive LOCK TABLEs during online processing, too many suspensions, too many timeouts, too many deadlocks, etc. All this is the result of taking too many locks or having a poor application design. It’s a CPU, availability and throughput issue.
Are you using the default package in WebSphere that’s bound Read Stability (RS)? This may be causing share lock escalations. Using row-level locking inappropriately? This costs a lot of CP U and counts toward your numbers for lock escalation. If you see drain requests during online processing when no utilities are running, check whether you have partition key updates occurring. (This applies to V7 only; in V8, it’s not a problem.)
There are several ways to reduce excessive locking. Avoidance is key. In V7, you need to be bound with CURRENTDATA(NO) and ISOLATION(CS) and perform frequent commits. In V8, you can be bound with either CURRENTDATA(NO) or (YES) with ISOLATION(CS), and take frequent commits.
Lock avoidance is a process that puts a latch on the data, not a lock; it’s held in the buffer manager and costs only a few CP U instructions. It’s really just controlling the sequence of users of the data. A better option is to use uncommitted read. You avoid the overhead of a lock as well as lock avoidance.
Are you worried about seeing uncommitted data? How do you account for it now? Unless you’re using Repeatable Read (RR), if you leave the page, you release the lock and things could change before you read it again.
When using uncommitted read, you can take advantage of a technique called optimistic locking to avoid anomalous deletes and updates. Using this technique involves placing an update timestamp column on every table, then using that update column as part of the condition of every update and delete statement. In this way, if any other application process has modified data, or rolled back uncommitted updates, then the update or delete fails. This ensures data integrity and high availability, along with lower CPU costs. Figures 2 and 3 show a SELECT and UPDATE in an application that uses optimistic locking, which will become integrated into DB2 for z/OS Version 9.
We can’t stress enough that you must be careful and selective with row-level locking. It’s the default on other DBMSes (including DB2 LUW), but it also works differently on those platforms. On DB2 z/OS, it carries the same overhead as the other locks and the number of row locks counts toward lock escalation and can cause escalation to occur faster. Row locks don’t escalate to page, they escalate to tablespace. They also don’t resolve deadlocks; they can actually cause more.
8 Too much processing in SQL statements
Having logic in an SQL statement is a good thing. However, there are times when the processing is excessive. The ability to write an SQL statement in one place, and then run it from anywhere that has a connection to the database is powerful. Complex SQL can be an extreme performance advantage if:
- The program processes written into the SQL statement aggregate or filter data
- The complex SQL statement logic is data-intensive
- The application program can issue one large statement that returns a result rather than many smaller statements that return data. However, if a complex SQL statement processes data, rather than filters it, then it can be a performance detriment. SQL statements that use lots of UDFs (especially external UDFs), CASE expressions, and data conversions in the SELECT clause may impact performance. Nested table expressions that contain expressions in a SELECT clause and then have references to those nested expressions reused in outer expressions also may significantly impact performance. It’s almost always cheaper to put this type of processing in an application program vs. the SQL statement.
So, complex SQL statements can be a performance advantage or disadvantage. Data-intensive logic is almost always a performance advantage; data-processing logic is almost always a disadvantage. The trade-off is being able to balance the performance with the reusability, flexibility, and portability of complex SQL.
7 Overuse of DISTINCT
Is use of the DISTINCT clause justified in your query? Why is it there several times? Many tools that generate SQL automatically put a DISTINCT immediately after a SELECT by default. Also, many users and programmers will use DISTINCT “just to make sure.” DB2 needs a unique index on the columns specified in the SELECT clause to avoid a sort. Also, a DISTINCT used in a nested table expression will force materialization. The question to ask is, “Are duplicates possible?” If not, then remove the DISTINCT from the query.
If DISTINCT is required, then it should be used in a meaningful way. The performance of DISTINCT needs to be considered and the duplicates in a complex query should be eliminated as early as possible in the processing of that query. Also, try to use DISTINCT only once in a query. You can have multiple DISTINCTs in a query as of V8. You also can consider using a GROUP BY on all columns of the query in place of a DISTINCT. DB2 will consider both unique and non-unique indexes to avoid a sort in support of the GROUP BY.
6 Coding inefficient predicates Some questions that need to be addressed when it comes to DB2 SQL performance include:
- Have you coded efficient stage 1 predicates?
- Do you have stage 3 predicates (filtering logic in code)?
- Are all indexes known at the time of coding and were they exploited (stage 1 indexable predicates or index screening predicates)?
- Are rows being filtered as early as possible?
There are critical questions that should be evaluated by someone reviewing the code. Often, these types of reviews don’t happen and enhancement opportunities are missed.
Consider stage 3 predicates. This is where filtering occurs outside DB2. This type of code isn’t efficiently using the DB2 engine and, without allowing DB2 to do the work on the data while it has it, you have no tuning opportunities in DB2.
The most important thing you can do to improve query performance is to run appropriate statistics. Every column used in every WHERE clause should have statistics on it. You also need to be aware of when your predicates are being applied in the database. Transforming as many predicates as possible to stage 1 indexable will reduce CPU, improve table join sequence selection, and improve application performance. Column expressions are common on indexed columns in the WHE RE clause. Coding a column expression automatically makes the predicate stage 2, and not eligible for index access. Figure 4 shows how to convert two-column expressions to equivalent stage 1 predicates.
5 Using incorrect BIND parameters
Too many shops use a single set of BIND parameters for all plans and packages because it’s easy. Easy never equates to good performance. There are many bind parameters to consider for best performance in certain situations. Binding generically, or incorrectly, can hamper performance and can burn a lot of CPU for the wrong reasons.
Consider one of the most misunderstood bind parameters: RELEASE . The default is RELEASE (CO MMIT) and is often recommended for concurrency. If you specify CO MMIT, you inform DB2 that you’re not returning after commit for future work and are more concerned about availability. DB2 destroys some important things when this occurs. Using the option of DEALLOC ATE tells DB2 you’re returning for future work after commit and wish to save the IPROC s, UPROC s, SP ROC s, the cache for sequential detection and the cache for index lookaside. All these exist to improve performance of repeat processes and need information retained across commits. For example, without lookaside, the index is repeatedly reprobed, resulting in more GETP AGEs and significantly impacting performance. Regarding concurrency, remember that X and S locks (the incompatible locks ) are released at commit even with DEALLOC ATE. Only the intent locks are held past the commit.
So, you have a lot of CPU to burn. Great. Use DEGREE(ANY) everywhere! If you don’t have long-running queries that can truly benefit from CPU parallelism or the resources to support it, this isn’t a wise idea. CPU parallelism does have benefits; you have to use CPU to get the benefits of elapsed time reduction for these queries and you need to be sure there’s an adequate number of threads and virtual storage. DEGREE (ANY) creates two access paths for every statement that must be stored in the EDM (Environmental Descriptor Manager) pool cache. If using DEGREE(ANY), set reasonable max degrees in the subsystem so it’s used wisely and there’s not much degradation.
REOPT(VARS) re-determines access path at run-time. Are you using the right parameter? Does every query in the package need this? Re-optimization will look at host variable values and will perform an incremental bind (aka mini-bind). This incremental binding isn’t cheap and all statements are subject to re-optimization for every execution. Often, considerable time is spent unnecessarily in re-PREP ARES . The best practice is to put statement(s) that benefit from a re-optimization in its own package, then choose REOPT(VARS) for that package or make that statement dynamic.
4 Blind updates
Do you know exactly what changed during an update? It costs CPU to find out. Often, the updates are to just a few columns, or quite often, a single column. Maybe you should consider splitting the table based on columns that are updated. A high update table could consist of columns being updated and that data clustered in update sequence. Then the low update table would have the columns rarely updated. The result is much faster updates. This requires intimate knowledge of both data and process, but it’s worth it. Sometimes, no columns have changed at all, but we still blindly perform updates. While DB2 doesn’t actually update data a row if nothing changed, there’s still the work to find this out. There also are other methods outside DB2 for detecting changes, such as creating and comparing Cyclical Redundancy Checks (CRCs) or hash values. If you cannot control change detection, then perhaps detecting the change before you issue the update is a way to save significant resources. Figure 5 shows a change detection process and a subsequent update to separate tables that saves significant CPU.
3 Random distribution and access
If randomly distributed data doesn’t match your process, there’ll be excessive overhead. Also, random I/O gets expensive and the process of randomly inserting and storing data can be much more expensive than sequential inserts. There are more random reads necessary for insert location, and row relocation is more of a possibility. This can lead to unbalanced partitions and the need for more frequent REORGs. Random distribution of data also makes sizing free space more difficult.
Often, random keys/distribution occurs for less-than-good reasons. Often, this is attributable to the unnecessary use of surrogate keys (non-natural), which may result from an Object-Oriented (OO) design or, with legacy migrations, the notion of “parallelizing.” In other words, spread out the work to get all disks spinning at once. With the advent of high-performance DASD subsystems, large DASD cache, parallel access volumes, and large bufferpools, this idea is seriously outdated. If your application input is naturally parallelized, then parallelizing your data may not be a bad idea. However, accepting a batch of input data, and parallelizing to get better DB2 performance these days will generate just the opposite.
Some people also use random keys to avoid hot spots in indexes, but this really isn’t necessary since the advent of type 2 indexes. If you’re only randomly accessing data, then it’s fine to randomly store the data. Be sure you aren’t going to be running a high-speed sequential process against this data. You may luck out and get a list prefetch access path, but you may not. And you can’t take advantage of index lookaside or sequential or dynamic prefetch (because the last five out of eight pages accessed have to be sequentially available). Randomly storing data can cause death by random I/O in sequential processes. Your joins can never be clustered in the same sequence, even if joining on a clustering key.
This is a serious issue that starts during the design phase. It’s best at that point to look at the potential access patterns of the data, and store the data to match those patterns. Build a test database with test data and try various access patterns to see how DB2 will perform. Measuring the cost of performance before the database is built will save a lot of guesswork later. Remember that partitioning is good for availability and scale, but may not always be the performance choice.
2 Too many indexes
We used to say, “Index until it hurts.” In today’s high-volume, high-performance environments, we’re finding that it’s starting to hurt and must be evaluated. Indexes are expensive in terms of CPU for inserts, deletes, and some updates, not to mention the availability and maintenance issues. Some interesting problems commonly seen are: indexes duplicated by a modeling tool (primary key and foreign key) to support Referential Integrity (RI), but primary key is a subset of the foreign key and the index is duplicated maybe with just one new column. There are also many cases where indexes are added, but never actually used by the SQL. Partitioning indexes, just for partitioning, can now possibly be eliminated in V8 since the partitioning index isn’t needed. Surrogate, system-generated primary keys on every table almost always guarantee that there’ll be more than one index on every single table. This can instantly double the I/O and CPU consumed by a system that heavily modifies data. Are you read-only? Then index all you want. If you’re changing data, then you’ll want to keep the number of indexes to a bare minimum.
The “Holy Grail” of good database design is one index per table. This one index would have to support the primary key, foreign key, clustering, and partitioning (needs to match the table partitioning) requirements. This takes careful analysis of access paths and patterns during physical design. It also may require some testing to determine the cost/benefit of adding or omitting an index. If an alternate access path is required, can that access path be via another table? Can we possibly build the access path on the fly with a static look-up table? Figure 6 shows an access path built on the fly using a static date table, allowing full index access by ACCT _ID when no index exists to support it.
1 Use of blackbox I/O modules
We love it when development is easy and appears to have a future in reusability. However, generic design equals generic performance and ultimately a lot of wasted CPU. Flexibility can come at the price of performance and CPU consumption and there’s hardly anything you can do to help this problem once the application is implemented that way. With this approach, modules simply SELECT everything. There’s CPU cost associated with every column fetched, every host variable populated, and it’s worse in Java because, during the get Xxxx() method, each column selected is returned, requiring an object to be created. Typically, one program module or method is constructed to support every database object individually. This results in many single database calls to do single operations, eliminating the possibility of SQL joins, or any type of complex SQL at all. This type of design also can result in heavy overindexing and blind updates. The performance issues are exacerbated if the design introduces stored procedures to perform all the individual functions.
A generic design isn’t the end of the world. It serves a valuable business purpose and offers great leverage for choosing platform and language, etc. However, you can code generically and then save time to fix the performance problems with more specific code. Better yet, determine the areas of concern during development, and focus on coding specifically for those functions, and let the rest of the application be generic. Often, the extra effort will pay off in a big way!
There are more than 10 ways that CPU is wasted. While there are many ways to offload CPU, and machines are getting bigger and faster, there are always going to be obstacles. When the CPU is pegged at 100 percent usage, what do you do? You can reduce CPU consumption with a little care in design and thought as to how that data and application really work. New System z coprocessors can save money, but what if you need to reduce elapsed time, too? Time spent upfront to take the correct approach to design and development is ultimately worth it and helps meet the growing demands on the business. Z