What About QR “Blocking”?
Unlike the z/OS dispatcher, the CICS dispatcher only receives control when a program issues an EXEC CICS command. From the time a CICS task is given control to the point it issues a CICS command, that task owns the TCB it’s running on. When the TCB is the QR, this can cause problems. If a program is highly CPU-intensive between CICS commands, it will occupy the QR TCB for an extended time; CICS can’t dispatch any other tasks or handle any incoming transaction requests while it’s running. The resulting spike in transaction response time—starting when the task takes over the QR and continuing past the point where it returns control to CICS while CICS recovers from the backlog of outstanding work—can exceed Service Level Agreement (SLA) limits.
Exploiting OTE is an obvious solution for QR blocking. If the offending program runs on an L* TCB, the QR TCB is unaffected. While overall CPU utilization in the Logical Partition (LPAR) will spike, the QR TCB remains available to service transactions as usual and no increase in response times occurs. An unanticipated benefit of moving a QR blocking program to the OTE is that the response time of the offending transaction is also improved due to the elimination of the region slowdown that follows a QR block.
OTE exploitation reduces hardware costs by removing CICS’ reliance on a single, fast processor via multiprocessor exploitation. Hardware upgrades are required when the capacity of the box is reached, not the capacity of a single processor. Multiprocessor exploitation also eliminates the bottlenecks caused by QR blocking, providing more consistent transaction response times, even during peak workload. The QR constraint relief OTE exploitation provides can remove or delay the introduction of additional CICS AORs, or accommodate collapsing an existing CICS complex into fewer regions—which saves additional CPU, further delaying a hardware upgrade.
There is, indeed, no such thing as a free lunch. While the benefits of the OTE are real and readily achieved, care must be taken to ensure that only threadsafe-compliant code runs on an L* TCB and changes are required to force programs from the QR to an L* TCB. Potential pitfalls to OTE are few and can be avoided with proper planning and a firm understanding of the principles involved.
It’s not possible to test for threadsafe-compliant code; threadsafe is a matter of intent and can only be determined through a code review by a programmer familiar with the application. There are three minimum standards that must be met before a program is threadsafe-compliant:
• If the program is COBOL or PL/1, it must be Language Environment (LE)-enabled.
• The program code must be reentrant.
• The load module must have been linked with the RENT (reentrant) option.
Achieving reentrancy in COBOL or PL/1 is a matter of using the RENT option on the compile step. Assembler programs must be manually reviewed to ensure reentrancy. Once programs have been recompiled/reviewed and linked with RENT, use the CICS start-up option RENTPGM=PROTECT to test for reentrancy. When RENTPGM=PROTECT is specified, CICS loads all programs linked as RENT into read-only storage. If the program turns out to not be truly reentrant, CICS will issue message DFHSR0622, and the task will abend with an ASRA. Programs that can’t run successfully with CICS reentrant program protection active shouldn’t be considered threadsafe-compliant.
Identifying Non-Threadsafe Code
IBM defines a threadsafe program as “one that doesn’t modify any area of storage that can be modified by any other program at the same time, and doesn’t depend on any area of shared storage remaining consistent between machine instructions.”
If your program doesn’t access any “shared” storage areas in CICS, then it’s already threadsafe. If your program does access shared storage, someone must (manually) review the program to determine if the access is threadsafe, and if not, how to serialize the access to make it threadsafe. We will discuss the details of how to perform a threadsafe review and techniques to convert non-threadsafe programs in a follow-up article.
Forcing Programs Into the OTE
Programs are directed to execute on an L* TCB either implicitly by calling a CICS Task Related User Exit (TRUE) that has been enabled with the OPENAPI parm, or explicitly based on parameters in the program’s CICS definition.
When a program issues a call to an OPENAPI TRUE, CICS moves the task to an L8 TCB before giving control to the TRUE program. If the program is defined to CICS as THREADSAFE, it will be allowed to remain on that L8 TCB until it issues a non-threadsafe CICS command. When the OTE was first introduced, DB2 was the only OPENAPI TRUE supported by IBM; since then, IBM has added OPENAPI support for several CICS products, including native sockets, MQ Series and CICS/DLI. This is an implicit OTE access because the program has no control over how the TRUE was enabled and can’t determine when it’s running on the L* TCB.
The CONCURRENCY and Application Program Interface (API) parameters of the PROGRAM definition are used to explicitly force a program to run in the OTE. If a program is defined as CONCURRENCY(THREADSAFE) and API=OPENAPI, all its code will run on an L* TCB. If a program is defined as CONCURRENCY(REQUIRED), all its code will run on an L8 TCB. Note that in the first case, the program could run under either an L8 or L9 TCB. The only time a program will run on an L9 TCB is if the CICS region has storage protection active, this program’s EXECKEY is set to USER and API is OPENAPI. Before attempting to exploit the L9 TCB environment, be aware that significant performance problems may arise when programs on an L9 TCB either issue calls to OPENAPI TRUEs or issue many non-threadsafe CICS commands. To help customers using storage protection and the OTE avoid these problems, IBM added the CONCURRENCY(REQUIRED) option. If a program that would have run under an L9 TCB is defined as REQUIRED, it will instead be initialized on an L8 TCB.
Problem Determination With Non-Threadsafe Programs
Because it isn’t possible to test for threadsafe compliance, even the most thorough threadsafe conversion can result in a non-threadsafe program being allowed to run on an L* TCB. Unfortunately, when a program that isn’t coded to threadsafe standards runs in the OTE, truly unpredictable results will occur. These results will show up at random times (usually at peak processing) because they’re due to the actions of multiple, independent CICS tasks running programs that access the exact same storage area simultaneously. In some cases, the result of non-threadsafe activity will be invalid data results rather than an abend.
If non-threadsafe errors are suspected following a threadsafe conversion, use these guidelines to help identify the root cause:
• Ensure all the programs involved were compiled and linked as reentrant and that RENTPGM=PROTECT has been specified for the region. Use CEMT INQ SYS to determine the reentrant program protection value. Enforcing program reentrancy is the only threadsafe prerequisite that can be proved. With RENTPGM=PROTECT, any program that isn’t reentrant will abend when tested, providing the exact location of the invalid program code. Still, having non-reentrant programs running in the OTE is the number one cause of threadsafe errors.
• “Impossible” results are likely to be the result of threadsafe errors. Following a threadsafe conversion, if a user or programmer reports results that seem impossible, they probably are the result of a threadsafe error.
• The CICS trace table is invaluable in resolving suspected threadsafe errors; ensure your trace table has sufficient storage to provide trace entries from task initialization to task termination for your average task.
• If a program abend is suspected to be the result of a threadsafe error, it’s likely the transaction dump alone won’t have sufficient information to resolve the problem. If so, set CICS to take a full system dump and attempt to re-create the problem.
• If you suspect that incorrect data is the result of a non-abending threadsafe error, use CICS auxiliary trace to help narrow down the cause. You may find your current allocations for DFHAUXT and DFHBUXT aren’t large enough to hold the trace you need; if so, increase their allocations and run the trace again.
Not every CICS environment will benefit by exploiting the OTE. In some cases, the cost of a threadsafe conversion might exceed any anticipated performance gains; in others, a combination of non-threadsafe commands and non-OPENAPI TRUEs might result in CPU increases when programs are defined to run in the OTE. Most shops, however, will see positive results from a threadsafe conversion in the areas described here.