Aug 17 ’10

DB2 for z/OS Parallelism

by Willie Favero in z/Journal

If you want to get something done fast, get a friend to help. Usually, two can get a lot more done than one, especially when you’re trying to reduce how long something will take to complete. The same applies to DB2’s queries. If you want a query to complete faster, have a bunch of DB2’s friends (processors) help out. The more processor friends you get involved, the faster that query will complete. Of course, other factors can influence the impact of letting a bunch of processors attack the same query.

DB2’s query parallelism targets I/O-intensive queries (i.e., table space scans and large index scans) and Central Processor- (CP-) intensive queries (i.e., joins, sorts, and complex expressions). Its objective is to reduce the overall elapsed time of a query by taking advantage of available I/O bandwidth and processor power. This seemingly easy fix for long query elapsed times is one reason parallelism is considered such a perfect companion to data warehousing. Splitting a query across multiple processors doing multiple concurrent I/Os is among the most straightforward ways of reducing the elapsed time of a long-running query. You can have a query run across two or more general purpose engines and redirect some portion of the work to an IBM System z Integrated Information Processor (zIIP) specialty engine, if available, with no application code changes.  

Without parallelism, a query processes data sequentially. If the data is in a partitioned table space, that means processing occurs one partition at a time even though the data is split across partitions, or multiple independent physical data sets. The beauty of partitioning is that it gives the application the ability to process each partition independently. With parallelism turned on, DB2 can process each partition simultaneously, in parallel, using multiple CPs to reduce the overall elapsed time of the query while minimizing the additional CPU overhead for parallelism.

Parallelism comes in three different types:

DB2 parallelism isn’t a given; it isn’t “just available” in DB2. You must perform several actions before the optimizer decides to consider parallelism as an access path. First, all three forms of parallelism require that DB2 knows parallelism should be considered for a package or SQL statement. In a dynamic SQL environment (the SQL type most likely used in a data warehousing environment), the special register “CURRENT DEGREE” is used to enable/disable parallelism. If the CURRENT DEGREE register is set to “1” or “1 “, parallelism is disabled; it’s not available as an access path choice. If CURRENT DEGREE is set to “ANY,” then parallelism is enabled. The default value for CURRENT DEGREE is set on the installation panel DSNTIP8 or by the DSNZPARM keyword CDSSRDEF on the DSN6SPRM macro. Whatever value is set as the DB2 subsystem default, the value of the CURRENT DEGREE special register can be modified (overridden) via the SET CURRENT DEGREE SQL statement. For static SQL, a BIND or REBIND of a package specifying the DEGREE keyword can be used to set the CURRENT DEGREE special register for that package instance.

The default for CURRENT DEGREE normally should be set to “1,” disabling parallelism. Parallelism should be enabled on a per-task basis to ensure  valuable CPU resources aren’t wasted.

The next parallelism control to consider is the number of CPs DB2 will be allowed to use. MAX DEGREE on the installation panel DSNTIP8, or the DSNZPARM keyword PARAMDEG on the DSN6SPRM macro, can be used to set the maximum number of CPs DB2 can use for CPU query parallelism. The default for this value is zero, which allows DB2 to choose the degree of parallelism. Although using 0 can simplify things, you should take time to determine the best value for MAX DEGREE. A guideline for choosing a starting point for MAX DEGREE is to choose a value somewhere midpoint between the maximum number of CPs available and the maximum number of partitions that will be processed. If you believe the queries will tend to be more on the CPU-intensive side, make this value closer to the number of CPs you have available. If the queries will be more I/O-intensive, make this number consistent with the number of partitions. Monitor and adjust accordingly. If the default 0 is accepted, be aware that it’s possible to get a degree of parallelism up to the 254 maximum. If multiple concurrent queries all get degrees that high, it could open up a whole different set of problems, not least of which could be available storage.

Once parallelism is enabled, VPPSEQT (the parallel sequential threshold for buffer pools) must also be adjusted to some value greater than zero to actually get the optimizer to consider a query for a parallel access path. The number used for VPPSEQT threshold is a percentage and is 50 percent by default. When specified, it allows for a percentage of the sequential steal threshold (VPSEQT) to be used for parallel processing. VPSEQT is a percentage of the VPSIZE (virtual pool size) value.  The default for VPSEQT is 80 percent. If VPSIZE is set to the hypothetical value of 100 pages and VPSEQT is set to 80 percent, then 80 pages will be available from this particular pool for sequential processing. If you set VPPSEQT to 50 (%), then the default 40 pages of the VPSEQT pages are available for parallel processing.

Prior to DB2 9, the optimizer picked the lowest-cost sequential plan and then determined if anything in that plan could be run in parallel. As of DB2 9, that lowest cost figure is determined after parallelism has been considered; this is a significant change from previous versions of DB2.

The biggest contributor to the degree of parallelism that DB2 will pick is the number of partitions.  Nothing influences DB2’s parallelism like partitions.

Some actions are required to turn on parallelism. Even if everything is set correctly to make parallelism available in a DB2 subsystem, it’s not a forgone certainty that parallelism will be used. Several factors could still prevent DB2 from selecting parallelism even after it’s enabled:  

There are a few other DSNZPARMs that might be of interest. These ZPARMs are referred to as opaque or hidden, which means they can’t be set up or modified via the DB2 installation panels. The first is PTASKROL on the DSN6SYSP macro. This ZPARM rolls up the accounting trace records for parallelism into a single record. Its possible values are YES and NO with YES being the default. If set to YES, all the parallel task records are rolled up into a single record. With YES, less System Management Facility (SMF) data is collected and processing costs are reduced. However, some detail is lost. Generally, you should use the default for performance reasons. If you’re attempting to diagnose a balancing issue, NO should be considered to obtain the more detailed individual records. If NO is specified, each parallel child task produces its own accounting trace record.

Next is the hidden DSNZPARM SPRMPTH on the DSN6SPRC macro. This ZPARM can be extremely helpful and is well-documented in numerous DB2 presentations. There’s a threshold, by default 120 milliseconds (ms), a query must reach before actually using parallelism. Because of the initial cost set up for parallelism, it shouldn’t be used for short (and quick) running queries. This threshold prevents that from happening. Nothing that runs in less than 120 ms will use parallelism. In some instances 120 ms is still too low to eliminate what still might be considered “fast running” queries from considering parallelism. For those situations, consider increasing SPRMPTH. While there are situations where you might make this threshold higher than the default, there is no reason to make it any smaller.

PARAPAR1 and OPTOPSE are ZPARMs that no longer exist in DB2 9. OPTOPSE was removed from DB2 V8. The enhancements they delivered are no longer selectable options; the fix OPTOPSE delivered is now always ON. The more aggressive parallel IN-List processing delivered by PARAPAR1 is now part of DB2 9.

There are many sources, including information available in the accounting records and available through almost all monitors, to help you determine what parallelism is doing. Additional details are in IFCIDs:

There are also columns in EXPLAIN’s PLAN_TABLE and DSN_STATEMENT_CACHE tables with parallel details, plus two EXPLAIN tables parallelism specifically uses, DSN_GROUP_TABLE and DSN_RTASK_TABLE.

Another reason parallelism can be especially significant is its potential to reduce the cost of doing business on the System z platform. Parallelism breaks a query into multiple parts, each part running under its own Service Request Block (SRB), and each part performing its own I/O. Although there’s additional CPU cost when DB2 first decides to take advantage of query parallelism for the setup, there’s still a close correlation between the degree of parallelism achieved and the query’s elapsed time reduction. The use of SRB is significant. When taking advantage of parallelism in DB2, parallel child tasks can be redirected to a zIIP. Software charges are unaffected by the additional CPU capacity made available by the addition of zIIP processors when added to System z.

One of the easiest ways to improve the amount of zIIP redirect is to enable parallelism. There’s also batch work. Taking advantage of DB2’s parallelism in your batch jobs could increase the amount of redirect to a zIIP while also using a resource during a time when the usual Distributed Relational Database Architecture (DRDA) redirect is low.

Parallelism can be quite valuable, as it can significantly reduce elapsed times for some batch jobs. It can be a game-saver for warehousing, an area that often sees long-running queries that could benefit from the elapsed time improvements sometimes available with parallelism. Parallelism gets a little boost in performance, reporting, and stability with every new version of DB2; DB2 10 will continue that tradition.