Jun 28 ’13
CICS and the Open Transaction Environment: A Retrospective
This series of articles covers many facets of the CICS Open Transaction Environment (OTE), including its functions and benefits. We will also delve into some of the drawbacks that OTE adopters have found and how to avoid them. This article provides a high-level overview of the history and basic terminology of OTE; future articles will be more in-depth and technical in nature.
It has been just over 10 years since IBM first introduced OTE. There’s some minor disagreement as to which release of CICS first provided actual OTE capabilities. CICS Transaction Server 1.3 added OTE support, but for Java only; CICS TS 2.2 first gave us the L8 Task Control Blocks (TCBs) that CICS programmers and systems programmers have come to know and love.
Every subsequent release of CICS has contained enhancements to OTE, usually by converting additional EXEC CICS commands to be threadsafe, but including significant additions such as:
• Direct access to the OTE via the OPENAPI parm on a program definition, eliminating the previously required Task Related User Exit (TRUE) call
• Support for user-key OTE programs via the L9 TCB type
• Elimination of the stiff performance penalty for programs running in the OTE in user key, while also using DB2 (or any other OPENAPI TRUE)
• Conversion of many IBM products to use OPENAPI TRUEs, including WebSphere MQ, CICS Sockets and CICS/DLI.
The OTE has now been present for seven releases of CICS. Hundreds of millions of CICS transactions exploit OTE daily. Between the CPU savings of threadsafe DB2 transactions and the multiprocessor exploitation OTE facilitates, use of OTE has directly reduced hardware costs at many sites. Nevertheless, there’s still an air of mystery surrounding the technology of OTE and confusion over the technical terminology used to describe OTE and its many options.
The Early Years
A discussion of OTE must begin with a description of the z/OS TCB. The TCB is one of the basic z/OS control blocks. Think of it as representing your job to the z/OS dispatcher. When z/OS gives your job control of a processor, it does so via the TCB. There’s a one-to-one relationship between a TCB and the CPU because a given TCB can run on only one processor at a time. In a multiprocessing machine, z/OS dispatches TCBs to the different processors as needed, but at any time, each processor has only one active TCB assigned to it.
When CICS debuted, it ran as a single operating system task under one TCB. Because CICS had only one TCB, everything within it ran single-threaded, meaning that once the CICS dispatcher had given control to a user program, that program had complete control of the entire region until it requested a CICS service. For example, if the program called a service that included an OS wait, the entire region would wait with it. As a result, the CICS documentation included a list of OS and COBOL services that CICS programs couldn’t use. The flip side was that programs didn’t have to be reentrant between CICS commands. IBM coined the term “quasi-reentrant” to describe this functionality and every new programmer was taught how to write quasi-reentrant code.
There’s another drawback to having only one TCB: You’re restricted to the capacity of one processor. This limitation has no impact when running on a uniprocessor, but the introduction of multiprocessor mainframes created new issues for the CICS systems staff. If your region has outgrown the full capacity of a uniprocessor running at 100 MIPS, it won’t improve the situation if you upgrade to a 200 MIPS multiprocessor that’s running two 100-MIPS cores because CICS is still limited to a single (100-MIPS) processor. More than one red-faced capacity planner has tried to explain why CICS performance degraded after upgrading from a 100 MIPS uniprocessor to a 400 MIPS multiprocessor just because the multiprocessor had eight 50-MIPS cores. IBM recognized this single-processor limit and responded by attempting to offload some of the CICS workload to additional MVS TCBs that would be able to run concurrently on a multiprocessing machine. For convenience, IBM labeled the main CICS TCB as the QR TCB.
The most significant implementation of this type of offloading came with the introduction of the DB2 Database Management System (DBMS). Rather than establishing one TCB for all DB2 activity, CICS would create a separate TCB for each concurrent DB2 request and switch (or “spin”) the task to that TCB while DB2 system code ran. While all the application programs for each task in the region still ran single-threaded, each task’s DB2 workload could be running simultaneously, limited only by the total capacity of a multiprocessor. On a practical level, the DB2 workload seldom approached the CICS workload, meaning that CICS users were still constrained by the processing speed of a single processor. Also, while the overhead of an individual TCB swap (roughly 1,900 instructions) is slight, some tasks issue thousands of simple DB2 calls, meaning that these two TCB swaps for each DB2 request can add up to as much as 30 percent of total application CPU for those tasks.
Creation of the OTE
Awareness of this level of overhead led IBM to consider the possibility of CPU savings in a CICS/DB2 transaction if the transaction wasn’t returned to the CICS TCB after DB2 had processed a request. This was explored many years ago at SHARE, the user group for IBM mainframe customers. A member of the IBM CICS development team, introducing OTE during a SHARE presentation, revealed its significance: Use of the OTE changes one of the two basic rules every CICS systems and applications programmer knew—that CICS programs must be quasi-reentrant. Instead, when running in the OTE, a CICS program must be “fully” reentrant. If a task could remain on its DB2 TCB following a DB2 call (eliminating one TCB swap), it would still be on the DB2 TCB at the time of the next call, eliminating a second TCB swap. OTE was developed to provide support for running CICS application code outside the QR TCB.
The differences between “quasi” and “fully” reentrant and how to change a program to make it fully reentrant have caused confusion, but put simply, OTE allows an individual CICS transaction to run under its own z/OS TCB instead of sharing the QR TCB. Many transactions, each under their own TCB, can run simultaneously in the same CICS region. The environment is “open” in that it supports activity that the QR TCB prohibits, so that if a transaction running in the OTE issues an OS WAIT, no other CICS transactions are affected.
Strength Equals Weakness
The drawback of OTE is that more than one occurrence of the same program can run simultaneously, which is also the reason quasi-reentrant was no longer sufficient. This, of course, raises the question of why programs in the OTE can’t be quasi-reentrant. The problem arises in the areas of CICS storage that can be shared between tasks, such as CICS shared storage, the CICS Work Area (CWA) and even program load modules.
A simple example of the type of problem created is the common practice of maintaining a counter that can be used to create a unique record key. Many CICS applications used the CICS CWA to hold the current value of the counter, and programs would increment the counter as they used its value. Under “classic” CICS, as long as the record counter was updated before the next CICS command was issued, the integrity of the counter was assured. With OTE, it’s possible for two or more transactions to use the current value of the counter simultaneously, resulting in duplicate keys.
It’s ironic that the greatest benefit of OTE—the ability to run many tasks simultaneously—is also its greatest drawback.