Some people also use random keys to avoid hot spots in indexes, but this really isn’t necessary since the advent of type 2 indexes. If you’re only randomly accessing data, then it’s fine to randomly store the data. Be sure you aren’t going to be running a high-speed sequential process against this data. You may luck out and get a list prefetch access path, but you may not. And you can’t take advantage of index lookaside or sequential or dynamic prefetch (because the last five out of eight pages accessed have to be sequentially available). Randomly storing data can cause death by random I/O in sequential processes. Your joins can never be clustered in the same sequence, even if joining on a clustering key.
This is a serious issue that starts during the design phase. It’s best at that point to look at the potential access patterns of the data, and store the data to match those patterns. Build a test database with test data and try various access patterns to see how DB2 will perform. Measuring the cost of performance before the database is built will save a lot of guesswork later. Remember that partitioning is good for availability and scale, but may not always be the performance choice.
2 Too many indexes
We used to say, “Index until it hurts.” In today’s high-volume, high-performance environments, we’re finding that it’s starting to hurt and must be evaluated. Indexes are expensive in terms of CPU for inserts, deletes, and some updates, not to mention the availability and maintenance issues. Some interesting problems commonly seen are: indexes duplicated by a modeling tool (primary key and foreign key) to support Referential Integrity (RI), but primary key is a subset of the foreign key and the index is duplicated maybe with just one new column. There are also many cases where indexes are added, but never actually used by the SQL. Partitioning indexes, just for partitioning, can now possibly be eliminated in V8 since the partitioning index isn’t needed. Surrogate, system-generated primary keys on every table almost always guarantee that there’ll be more than one index on every single table. This can instantly double the I/O and CPU consumed by a system that heavily modifies data. Are you read-only? Then index all you want. If you’re changing data, then you’ll want to keep the number of indexes to a bare minimum.
The “Holy Grail” of good database design is one index per table. This one index would have to support the primary key, foreign key, clustering, and partitioning (needs to match the table partitioning) requirements. This takes careful analysis of access paths and patterns during physical design. It also may require some testing to determine the cost/benefit of adding or omitting an index. If an alternate access path is required, can that access path be via another table? Can we possibly build the access path on the fly with a static look-up table? Figure 6 shows an access path built on the fly using a static date table, allowing full index access by ACCT _ID when no index exists to support it.
1 Use of blackbox I/O modules
We love it when development is easy and appears to have a future in reusability. However, generic design equals generic performance and ultimately a lot of wasted CPU. Flexibility can come at the price of performance and CPU consumption and there’s hardly anything you can do to help this problem once the application is implemented that way. With this approach, modules simply SELECT everything. There’s CPU cost associated with every column fetched, every host variable populated, and it’s worse in Java because, during the get Xxxx() method, each column selected is returned, requiring an object to be created. Typically, one program module or method is constructed to support every database object individually. This results in many single database calls to do single operations, eliminating the possibility of SQL joins, or any type of complex SQL at all. This type of design also can result in heavy overindexing and blind updates. The performance issues are exacerbated if the design introduces stored procedures to perform all the individual functions.
A generic design isn’t the end of the world. It serves a valuable business purpose and offers great leverage for choosing platform and language, etc. However, you can code generically and then save time to fix the performance problems with more specific code. Better yet, determine the areas of concern during development, and focus on coding specifically for those functions, and let the rest of the application be generic. Often, the extra effort will pay off in a big way!
There are more than 10 ways that CPU is wasted. While there are many ways to offload CPU, and machines are getting bigger and faster, there are always going to be obstacles. When the CPU is pegged at 100 percent usage, what do you do? You can reduce CPU consumption with a little care in design and thought as to how that data and application really work. New System z coprocessors can save money, but what if you need to reduce elapsed time, too? Time spent upfront to take the correct approach to design and development is ultimately worth it and helps meet the growing demands on the business. Z