Sep 13 ’10

Building Better Performance for Your DB2/CICS Programs With Threadsafe

by Russ Evans, Nate Murphy in z/Journal

When IBM released CICS TS 2.2 in December 2002, which introduced Task-Related User Exits (TRUE) in the Open Transaction Environment (OTE) architecture, a primary selling point was potentially significant CPU savings for CICS/DB2 applications defined as threadsafe. To be threadsafe, a program must be Language Environment- (LE-) conforming and knowledgeable CICS programmers must ensure the application logic adheres to threadsafe coding standards. (For more information, see “DB2 and CICS Are Moving On: Avoiding Potholes on the Yellow Brick Road to an LE Migration,” z/Journal, April/May 2007.) This may require knowledge of Assembler code to follow the many tentacles of application logic that need to verify the application and its related programs are threadsafe. If you define a program to be threadsafe, but the application logic isn’t threadsafe, then unpredictable results could occur that could compromise your data integrity. This article provides some background on what threadsafe means at the program level, how to identify and correct non-threadsafe coding, and how to ensure your programs are maximizing their potential CPU savings.

Background

CICS was initially designed to process using a single Task Control Block (TCB). Once the CICS dispatcher had given control to a user program, that program had complete control of the entire region until it requested a CICS service. If the program issued a command that included an operating system wait, the entire region would wait with it. As a result, CICS programming guides included a list of operating system and COBOL commands that CICS programs couldn’t use. The flipside of these limitations was the advantage that CICS programs didn’t have to be re-entrant between CICS commands.

As all activity in the CICS region was single-threaded, it was also restricted to the capacity of one CPU. The introduction of multi-processor mainframes raised new issues for the CICS systems staff, when the purchase of a faster (and more expensive) mainframe would slow down CICS if the individual processors on the new machine were slower than the single processor it replaced. IBM responded by attempting to offload some of the CICS workload to additional CICS-controlled MVS TCBs that could run concurrently on a multi-processing machine. For convenience, IBM labeled the main CICS TCB as the Quasi-Reentrant, or QR TCB.

The most significant implementation of this type of offloading came with the introduction of the DB2 Database Management System (DBMS). Rather than establishing one TCB for all DB2 activity, CICS would create a separate TCB for each concurrent DB2 request and switch the task to that TCB while DB2 system code ran. While all of the application programs for each task in the region still ran single-threaded, each task’s DB2 workload could run simultaneously—limited only by the total capacity of a multi-processor. On a practical level, the DB2 workload seldom approached the CICS workload, meaning CICS users were still constrained by the processing speed of a single processor. Also, while the overhead of an individual TCB swap (roughly 2,000 instructions) is slight, these two TCB swaps for each DB2 request can account for as much as 30 percent of total application CPU.

Open Transaction Environment

In a classic “ah ha!” moment, someone at IBM realized this TCB swapping overhead could be eliminated by simply not swapping the transaction back from the DB2 TCB and allowing application code to run there. To provide support for running CICS application code outside of the QR TCB, the concept of the OTE was developed. Put simply, OTE allows an individual CICS transaction to run under its own MVS TCB instead of sharing the QR TCB. Many transactions, each under their own TCB, can run simultaneously in the same CICS region. If a transaction running in the OTE issues an operating system wait, none of the other transactions in the CICS region are affected.

The drawback of OTE is that more than one occurrence of the same program can run simultaneously, requiring CICS programs to be re-entrant between CICS calls. A simple example of the type of problem created is the common practice of maintaining a record counter in the Common Work Area (CWA) that’s used to create a unique key. Under “classic” CICS, as long as the record counter was updated before the next CICS command was issued, the integrity of the counter was assured. With OTE, it’s possible for two or more transactions to use the counter simultaneously, resulting in duplicate keys.

Fully re-entrant programs—that don’t assume access to data in shared storage areas will automatically be serialized—are defined as “threadsafe.” It’s crucial to remember that threadsafe isn’t a determination CICS makes, but a promise the programmer makes. By marking a program as threadsafe, the programmer is stating that the program won’t cause any damage if it’s allowed to run in the OTE.

Preparing CICS Regions for Threadsafe Activity

There are two ways to control the use of threadsafe in a CICS region. On the program definition, a new parameter has been added: concurrency. CONCURRENCY=QUASIRENT indicates the program must run on the QR TCB; CONCURRENCY=THREADSAFE marks a program as threadsafe, allowing it to run on an open TCB. Be aware that marking a program as threadsafe doesn’t make it threadsafe; the programmer uses this parameter to define programs that have proved to be threadsafe.

The second control is at the region level. Specifying FORCEQR=YES in the SIT will override the CONCURRENCY parameter on the program definitions to force all programs to run on the QR TCB.

Before marking any program as threadsafe, all Task Related and Global User Exits (TRUEs and GLUEs) that are active in the region must be reviewed to ensure they’re threadsafe-compliant and defined as threadsafe. Activating threadsafe programs in a region with non-threadsafe exits can result in a significant increase in CPU utilization.

Ensuring Threadsafe Compliance

All threadsafe programs must be re-entrant. LE programs can be guaranteed re-entrant by compiling with the RENT option; Assembler programs can be easily tested for re-entrancy by linking with the RENT option and then running in a CICS region with RENTPGM=PROTECT. Non-re-entrant programs will abend with an S0C4 when they attempt to modify themselves. It’s strongly recommended that all CICS regions running threadsafe programs use RENTPGM=PROTECT.

Unfortunately, there’s no automated way to identify non-threadsafe program code. IBM does supply a utility, DFHEISUP, that can be useful in identifying potential non-threadsafe programs. It works by scanning application load modules, looking for occurrences of commands found in member DFHEIDT. (Details appear in the CICS Operations and Utilities Guide.) DFHEISUP will report, for example, that a program issues an ADDRESS CWA command. Since the CWA is often used to maintain counters or address chains, a program addressing the CWA could be using it in a non-threadsafe manner. On the other hand, the program could also be using the CWA to check for operational flags, file Data Definition (DD) names, or other uses that don’t raise threadsafe issues. More worrisome, DFHEISUP could report no hits on an application program, leading you to believe  the program was threadsafe, when the program was in fact maintaining counters in a shared storage location whose address is passed in the incoming commarea.

While DFHEISUP is helpful in the process of identifying threadsafe applications, the only way to ensure an application is threadsafe is to have a competent programmer review it in its entirety.

Making Programs Threadsafe

It’s possible for a program to access shared storage areas such as the CWA while remaining threadsafe-compliant. Each shared storage access must be reviewed independently to determine its status. Accesses that require update integrity or repeatable read integrity (counters, in-core tables, pointers, etc.) aren’t threadsafe-compliant and must be serialized before running in a threadsafe region.

Various serialization options are available to CICS programmers:

As an example, a program that used a CWA field as a record counter could: 

Regardless of which method or methods are used to serialize access, it’s critical that all programs that access the storage be modified before any of them are marked as threadsafe. Adding an ENQ in PROGA to serialize access to the CWA won’t prevent PROGB from updating simultaneously.

Maximizing CPU Savings and Performance

Because the CPU savings achieved in threadsafe CICS/DB2 programs is the result of not issuing TCB swaps, your CPU savings is maximized if your program remains on its L8 TCB from the time it issues its first DB2 command to the time it terminates. This isn’t always possible due to the issue of non-threadsafe commands. (Not all EXEC CICS commands are threadsafe. Consult the Application  Systems Programming Guide for a list of commands that are threadsafe in your release.)

If your program issues a non-threadsafe command while running on the L8 TCB, CICS will automatically swap your task to the QR TCB, where it will remain until the next DB2 command, reducing the potential CPU savings. The reduction in TCB swaps may also result in improved performance.

CPU Savings

In the article, ”CICS Open Transaction Environment And Other TCB Performance Considerations” (www.cmg.org/proceedings/2006/6130.pdf), Steven R. Hackenberg, an IBM Certified IT Specialist, gave this example of the potential CPU savings: “To put this in perspective, consider a CICS region that processes 1,000 transactions per second with each doing one DB2 request. That amounts to 4 MIPS just for task TCB switches.”

Another example can be found in the article, “Running OMEGAMON XE for CICS as Threadsafe to Reduce Overhead While Monitoring CICS Transaction Server V2 from CCR2

(http://www-01.ibm.com/software/tivoli/features/ccr2/ccr2-2004-06/features-cics.html) written by Richard Burford, an IBM R&D developer. The CPU savings can also have a direct impact on reducing your mainframe software cost.

Improved Performance

If you’re experiencing poor response times, can threadsafe help resolve your problem? Here are the questions you must answer:

If you answered yes to most of the questions, then defining programs as threadsafe and processing as many tasks as possible on an open TCB will remove this constraint on the QR TCB and reduce the response times of both threadsafe and non-threadsafe transactions.

Steven R. Hackenberg also gave this recommendation: “As the QR TCB exceeded 50 percent of a single CP, or the CPU consumed by QR and higher priority workloads exceeded 50 percent of the LPAR [Logical Partition], experience has shown that ready transactions would begin to queue while trying to gain access to the QR TCB. This would be evidenced by the rapid increase in wait-on-first-dispatch and wait-on-redispatch times found in the CMF 110 performance records, and rapidly eroding total response times.”

Selecting the Pilot Threadsafe Application

Because the CPU reduction in threadsafe programs is the result of eliminating the TCB switch overhead from every DB2 call, you’ll receive the greatest benefit by converting heavily used programs that issue large numbers of DB2 requests. Review your CMF statistics to identify how many DB2 calls each application issues and then multiply that number by the number of application transactions per second in your environment. You can use the result to sort your applications by DB2 calls (i.e., TCB switches) per second; the higher the number, the greater the potential CPU savings.

For a pilot project, you must also consider the scope and complexity of the conversion. The ideal candidate will have a combination of:

Additionally, applications running in a QR-constrained CICS environment—where transactions have a large wait for QR dispatch—will show additional reduction in response time, as application code processing is diverted to the L8 TCBs.

Summary

Running DB2 programs as threadsafe on the CICS Open Transaction Environment is a true win-win scenario. It provides us the opportunity to significantly reduce CPU requirements in our production regions while simultaneously increasing throughput by exploiting the z/OS multi-processor environment.