IT Management

Choices for Caching Data in the Enterprise

4 Pages

To best use the buffer pools in DB2, we must understand how the applications use the data because this will influence how we set up the buffer pools. DB2 processes pages used in the buffer pools in two categories: random (pages read one at a time) and sequential (pages read via prefetch). These pages are queued separately: in a random Least Recently Used (LRU) queue or a Sequential Least Recently Used (SLRU) queue, where prefetch steals only from this queue. These queues (set with the VPSEQT buffer pool parameter) can be adjusted with the ALTER command. If you have heavy sequential processes using a lot of sequential processing and sequential prefetch, you’d want to be sure the buffer pools have a large sequential area. Other processes may be more random and not benefit from large sequential queues (i.e., online random index access). If DB2 doesn’t have pages available on the sequential queue, sequential prefetch can be turned off.

You also can use First-In, First-Out (FIFO) processing instead of the default LRU. With this method, the oldest pages are moved out regardless. This alternative decreases the cost of doing a GETPAGE operation and reduces internal latch contention for high concurrency. You would use this option only where little or no I/O occurs and the table space or index is resident in the buffer pool (i.e., highly used code tables and indexes that are memory resident). By pinning highly used objects into memory in their own buffer pool, you can then set it to be 100 percent random (VPSEQT=0) so no sequential queue is managed, and then turn on FIFO queuing. This can save on CPU and GETPAGE cost for these heavily used, memory-resident objects.

Your use of separate buffer pools should be based on their usage by the applications (such as buffer pools for objects that are randomly accessed vs. those that are sequentially accessed). Each pool will have its own unique settings, and the type of processing may even differ between the batch cycle and the online day. Figure 3 provides a detailed example of buffer pool object breakouts. These are generic breakouts just for this example. Actual definitions would be much finer-tuned and less generic.

Sizing DB2 Buffer Pools

Will providing a large cache always provide better performance? Not necessarily. While DB2 can manage a large cache well, it may not necessarily be beneficial due to the nature of the data access. The biggest abuse we see today with large buffer pools is when they’re used for objects that are accessed in a 100 percent random fashion with no reference of data. In this case, a large area of memory won’t help. This is often the result of a poor database design (i.e., use of random keys).

The VPSIZE parameter determines buffer pool sizes. This parameter sets the number of pages to be used for the virtual pool. To size buffer pools correctly, it’s helpful to know the residency rate of the pages for the object(s) in the buffer pool. This can be a difficult process and it requires understanding the applications and their use of the data.

Application Server-Level Caching

Modern servers can be configured with a considerable amount of memory, and this memory can be leveraged for a considerable performance boost. This potential performance gain is due to the simple concept that the least expensive call to the database is the call never made. The most expensive thing you can do in an application is call a database. With DB2 for z/OS, any call to the database is a cross-memory call across address spaces. That’s expensive and especially so if the data the application is looking for is commonly accessed data that rarely changes. This includes data stored in code tables or commonly accessed reference tables.

Imagine a large, national retailer that manages their inventory in 100 warehouses spread out across the country. Their inventory management application must deal with moving thousands of products throughout the network of warehouses, and the product and distribution requirements include critical warehouse information that’s stored in a set of database tables. This processing results in these tables being read hundreds of millions of times daily. However, the tables containing this warehouse information rarely change. If all this data was cached at the application server, then hundreds of millions of data server calls per day could be eliminated. Imagine the network and data server savings!

4 Pages