As with any enhancement, DPSIs bring a few new caveats for the application and database designer. DPSIs split up the data along the partitioning scheme, which means there is a larger number of DB2 data sets to manage, backup, and recover. This is especially vital to remember, since DB2 Version 8 offers up to 4,096 partitions, and having just three DPSI indexes could alone result in 12,288 index data sets. Reviewing the DSMAX settings and the associated DBD size within your EDM Pool definition will be important, as the number of DPSIs and their associated data sets redefine your old NPI indexes. Also, since DPSIs allow parallelism to be leveraged across the maintenance utilities, the utilization of Sorting, and the possible use of the new SORTDEVT and SORTNUM, parameters may need to be reviewed.
Probably the most important caveat about DPSIs is that they can’t be defined as unique. Because of this, DPSIs may only be useful if the queries include a partitioning key and there aren’t many possible partitions to search. For applications to leverage the performance benefits of DPSIs, they will need to have good statistics because of their structural difference from old NPIs and their potential for parallelism. It is vital that RUNSTATS capture good DPSI column distribution statistics because of the difference between clustering of the data and how it relates to the various partitions of the DPSI structure. As enterprises evaluate the new partitioning, clustering, and DPSI design alternatives, sites should validate and retest their index designs. DPSIs offer so much flexibility that you may be able to eliminate extra processing I/Os and maintenance by combining indexes previously needed by applications.
DPSIs combined with the other index improvements offer better performance support options. Version 8 improves indexes in many new ways, including providing better support for variable length (VARCHAR) column usage. Previously, the VARCHAR column was padded to its largest possible length, potentially making the index extremely large compared to the actual data. This was a problem because the column length limit on indexes was 255 bytes, but now has been expanded in Version 8 to 2,000 bytes. Also, depending on the padding decision, the VARCHAR index information may now be available to processes with “index-only access,” providing a performance improvement for some applications.
In addition, Version 8 index and optimizer improvements provide better handling for applications. First, indexes can now support a backward scan, which may help eliminate indexes that were previously necessary for performance. Also, indexes can now be utilized when the host variable and the target column have different definitions. Decimal and Float variable types are now comparable, which is important for languages that do not have a decimal data type. This feature is very valuable, as DB2 z/OS becomes the ultimate server for any computer language such as Java, C++, C#, and any code base such as EBCDIC, ASCII, or Unicode. With these new features and the ability to handle SQL predicates up to 32KB, the DB2 z/OS implementation further extends its list of DB2 Family-compatible features.
COLLECTION OF NON-INDEXED COLUMN STATISTICS
Along with the index improvements are improvements to the RUNSTATS utility to collect statistics on non-indexed columns. These RUNSTATS statistics will help the DB2 optimizer choose better access paths and especially improve Join opportunities for all queries using non-indexed columns, which are common in data warehousing, data mining, and ad hoc queries. DB2 Version 5 previously supported gathering non-indexed columns statistics into the SYSCOLDIST DB2 Catalog table through the DSTATS tool. This tool used SQL to gather the statistics and provided limited flexibility. Version 8 includes the functionality in RUNSTATS and adds numerous parameters, allowing more efficient statistics gathering and sort handling.
For this new functionality, RUNSTATS has added the following new parameters to control the collection of these non-indexed column statistics: COLGROUP, FREQVAL, LEAST, MOST, BOTH, SORTDEVT, and SORTNUM. These parameters provide flexibility to control the grouping of columns, the number of values captured, their highest and/or lowest cardinality values, and provide the ability to customize RUNSTATS SORT parameters.
For example, capturing these statistics in the SYSCOLDISTSTATS and SYSCOLDIST DB2 Catalog tables can help the DB2 optimizer understand that there is only a limited number of values for a particular column in a table. This knowledge helps the optimizer revise its default assumption of uniform column data values and recalculate the filter factors and possibly change the access path, thereby improving query performance.
Another VLDB and availability enhancement in Version 8 is better granularity and handling of I/O type errors that require DB2 to take data set pages offline and add them to the Logical Page List (LPL) for potential recovery. These rare LPL error situations typically are encountered in extraordinary situations, such as DASD I/O failures, Group Buffer Pool failures, or Log Apply failures. DB2 in Version 7 restricted access by adding the entire partition or page set to the LPL list. Taking these complete page sets offline could result in a major outage caused by only one or two LPL problem pages.
In DB2 Version 8, recovery from LPL situations has been further automated, as DB2 first attempts to heal itself by automatically trying to recover from the situation. Additional messages and error codes are also being developed to further detail the cause of the LPL errors. The most important improvement to Version 8 LPL handling is that DB2 now only puts the individual pages in LPL offline restricted status when it encounters an error. This allows other database activity to continue while keeping the integrity of the problem pages guarded for recovery. By restricting only the directly LPL affected DB2 pages, potential outages and recoveries are minimized.