CICS / WebSphere

Storage violations in CICS are unpredictable, notoriously difficult to identify and resolve, and can cause data corruption and region failure. For this reason, it’s important to learn the common causes of storage violations and what enhancements IBM has made to CICS that reduce the potential for storage violations.

The term “storage violation” has a very precise definition in CICS and is related to control blocks that existed prior to the new architecture introduced in CICS/ESA. In early releases of CICS, the control blocks that tracked task-related storage lived within the storage area they described. These Storage Accounting Areas (SAAs) contained information, including the length of the storage area and the address of the next SAA on the chain. If this information was over-written, CICS couldn’t track the next area of storage on the chain, meaning the storage couldn’t be freed when the task terminated. Storage violation was used to describe the specific condition where an 8-byte SAA had been altered. Because the SAA lived within task storage, it was common for a programmer to lose track of storage boundaries, and storage violations occurred frequently.

In addition to this strictly limited technical definition of a storage violation, the presence of an overlaid SAA almost always indicated that some additional storage had been corrupted. As CICS ran in a single address space and storage key, this corruption could occur anywhere in the region, including CICS and user load modules, control blocks and file buffers. The message that “A Storage Violation Has Occurred” was frequently followed by the CICS region failing in spectacular and inexplicable ways. It’s important to note the reverse isn’t always true; a CICS region can suffer huge storage overlays without generating a storage violation simply because no 8-byte SAA had been affected.

IBM recognized the need to reduce the frequency and severity of storage violations, while at the same time simplifying the process of problem determination, and has directed significant effort toward this goal. One of the significant enhancements IBM introduced was to relocate CICS’ critical control blocks outside the bounds of task-related storage—including the data held in the SAA. Rather than embedding SAAs in task storage, this critical data was moved into a new series of control blocks that live in 31-bit storage areas normally not addressable by a CICS task. The SAA was replaced by the storage check zone, a control block that’s almost universally referred to as the crumple zone. The crumple zone is an 8-byte area containing only a literal that’s formed by concatenating the task’s task number with a 1-byte field indicating the type of storage; whenever CICS acquires task-related storage, it increases the size of the getmain by 16 bytes and appends a crumple zone at the start and end of the storage. The address returned to the requesting program is always the first byte following the leading crumple zone, so the program never has addressability to it. When CICS frees the storage, it checks the values of the leading and trailing crumple zones. The definition of a storage violation is that any crumple zone has been corrupted.

Intercepting Storage Violations

While storage violations are almost always the result of an application coding error, the actual cause of the overlay can be difficult or impossible to determine. In many cases, the task in control when the violation was detected isn’t the task that caused the violation; frequently, the offending task has completed and already been cleaned up by the time CICS notices the violation. Prior to the introduction of CICS/ESA, there were only two avenues available to achieve the goal of reducing the number and severity of storage violations. The first—long, difficult and prone to disappointment—was reviewing CICS storage violation dumps to determine the cause of the violation; the second—proactive, long-term and with difficult to prove results—came through enforcement of proper coding practices. Applications programmers felt that the systems programmers didn’t make dump reading a priority but instead pushed for source code changes “just for the sake of standards” while systems programmers felt applications programmers ignored standards and assumed systems would always be available to read dumps. Both sides, meanwhile, complained to IBM.

IBM responded and has worked diligently to provide CICS with the capability of both preventing storage corruption attempts at run-time and improving the documentation provided when CICS produces a storage violation dump. The result are four options that will abend a task that attempts certain types of storage violations.

CICS storage protection was the first enhancement released by IBM that actually prevented storage violations. Prior to storage protection, the entire CICS address space ran in the same z/OS storage key, meaning that any user program could overwrite any user or system storage at any time. Storage protection changed this by allowing the CICS address space to be divided into two different storage keys. Key 8 (CICS key) is the primary storage key and key 9 (user key) is the subordinate key. Tasks running in CICS key (which includes most of the CICS system code) can read and write both CICS key and user key storage; tasks running in user key can read and write user key storage and can read CICS key storage, but will abend with a S0C4 if they attempt to overwrite CICS key storage. With the primary CICS control blocks and system code loaded into CICS key storage, these critical areas of CICS were protected from storage overlays. Further, if a user key program attempted to overwrite CICS key storage, the S0C4 abend would generate a transaction dump showing exactly which line of code caused the error, significantly reducing the effort required for problem determination. New program and transaction definition parameters were added to allow the customer to define non-IBM resources as CICS key where desired.

Storage protection is activated at the region level by setting the STGPROT option in the DFHSIT to ‘YES’ (the default is ‘NO’). Once activated, task-related storage is allocated from CICS or user key based on parameters in the TRANSACTION and PROGRAM definitions.

Storage protection didn’t solve the storage violation issue. By leaving all user programs to run in the same key (key 9), it had no impact if one user program violated another user program’s storage. Worse, some older CICS applications couldn’t run in user key, leaving those applications unprotected and capable of overwriting any storage in the region. Still, storage protection was a major step toward actually controlling the storage violation problem.

Transaction isolation, introduced in CICS/ESA 4.1, provided significant improvement over the storage protection feature. Transaction isolation uses the subspace group facility of MVS and allows CICS to function as if every user task had its own z/OS storage key. Tasks that run with transaction isolation active have read and write access to their own task storage and certain shared storage areas, read only access to CICS key storage and no access to the task-related storage for other user key tasks. This last feature—making other task’s storage fetch-protected—
went far beyond storage protection by abending programs that would never have generated a storage violation. By casting a wider net, CICS could now identify programs with addressability issues that might have led to storage violations in the future.

3 Pages