The COBOL compile option SSRANGE, in conjunction with the LE run-time option CHECK, will intercept attempts by a COBOL program to access an out-of-range entry in a table. When SSRANGE is used, the compiler adds code to every table accessed in the program to ensure the current index value isn’t greater than the OCCURS value. The CHECK run-time option is used to activate this checking. The combination of SSRANGE and CHECK can add significant CPU overhead to your processing, and should be used only when other attempts to identify the error code have failed. Generally speaking, it’s never a bad idea to compile all programs in the development environment with SSRANGE and have CHECK on in all development CICS regions.
With SSRANGE and CHECK activated, an attempted access to an out-of-range table entry will generate an LE U1006 abend, and a message will be written to CEEMSG that describes the error in detail.
There’s no automated method to identify storage overlays that are the result of DFHCOMMAREA length mismatches. When a program issues an EXEC CICS LINK, XCTL or RETURN with the COMMAREA option, it passes the desired length of the area to CICS. The serving program, when activated, will be passed the address of the DFHCOMMAREA and the length specified by the caller will be in EIBCALEN. The called program assigns the passed address to an 01 level in its LINKAGE area, and can then either use the data description in linkage or copy the area to working storage. There’s no problem if the called program’s data layout addresses an area less than the actual EIBCALEN; that simply leaves some unused storage at the end of the area. The problem occurs if the data layout addresses an area greater than the EIBCALEN. For example, if the actual commarea length is 100 bytes, but the 01 level in linkage includes a field that starts at offset 101, any attempt to update that field will cause a storage violation.
Commarea size mismatches are the leading cause of storage violations, yet they’re the simplest to prevent. Simply add code to the called program to compare the value in EIBCALEN to the length the program expects for the commarea; if the values differ, the transaction should be abended. There are no acceptable circumstances under which any program should reference a commarea without verifying the length first.
Debugging Storage Violations
Reading CICS storage violation dumps is more of an art form than a science. To begin with, you must have a full system dump of the state of the region at the time the storage violation was detected. CICS produces a system dump by default unless you’ve used dump suppression to manually prevent it, but your internal procedures must ensure the dump data set is preserved. While it’s almost impossible to resolve a storage violation without the internal CICS trace table, IBM recognizes that many shops can’t support the CPU overhead required for the trace and will create an “exception” trace record when a storage violation is detected. These exception entries are collected even when tracing is disabled and contain information that will greatly assist in evaluating the dump.
As a rule of thumb, start the problem determination process by finding the damaged crumple zone that’s located at the smallest address value. Because CICS only checks the crumple zones when an area is freemained, most storage violations are found at task termination time (i.e., when CICS frees all the remaining user storage areas associated with the task). It’s possible for a large number of crumple zones to be damaged before CICS recognizes that a storage violation has occurred, and the crumple zone that’s noticed first may not be at the beginning of the storage corruption.
If the first damaged crumple zone is a leading one, then the culprit is most likely not the task that owns the storage area in question. If the first damaged is a trailing crumple zone, then the task owning the storage is likely the culprit. This is due to the nature of crumple zones. To start with the second case first: As described earlier, programs are never given addressability to their leading crumple zones, but always have addressability to their trailing crumple zones by overrunning the end of a storage area. While there may be a damaged leading crumple zone immediately following the damaged trailing one, that can still be the result of overrunning the end of a storage area. However, if a leading crumple zone is the first damaged, it’s less likely this task caused the problem. Again, this task has never had addressability to the leading crumple zone, and if the corruption had been started in this task’s user storage, it most likely would have hit a trailing zone before it got to a leading zone.
On the other hand, storage violations are by their nature unpredictable. It’s quite possible for a program to loop through a routine that corrupts a few bytes of storage and then skips over a few kilobytes before its next overlay. In this case, the crumple zone that eventually is corrupted would be chosen completely at random and would be of no help in identifying the failing code.
The CICS trace table is invaluable when attempting to identify the cause of a storage violation. Once the damaged crumple zone has been identified, search the trace table backward to find the trace record that was written when the area was acquired. The trace will show who asked for the area to be acquired, and why. For example, a user program might make an explicit GETMAIN request, or the user program might have issued an EXEC CICS request that required storage to process, or LE may have needed the storage for some internal use. Since the damage must have occurred between the time the storage was acquired and the time the storage violation was noticed, use the trace entries between these two events to identify which other tasks were active during this period.
As the result of a combination of increased robustness of CICS application programs and the efforts IBM has made to reduce the ability of programs to corrupt storage, the number and severity of storage violations has dropped significantly. Despite this, it’s important to recognize the implications that lie behind the occurrence of even a single storage violation: that some unknown program has corrupted an unknown amount of storage. The only thing we know for sure about a CICS region that’s suffered a storage violation is that we know absolutely nothing about the state of the region at that time. Data buffers may have been overlaid while being written to DASD, other task’s working storage may have been altered in ways that change their logic paths and literally any portion of the CICS environment may have been compromised. By their nature, storage violations are random and unpredictable and should always be considered a high-severity problem.
If your shop never experiences storage violations, it’s still good practice to activate the low-overhead protection options, such as storage protection and reentrant program protection, as a preventive measure against future problems. If your shop sees storage violations on a regular basis, and the various protection features described here aren’t already in place, they should be reviewed and activated. If your shop is CPU-constrained, consider activating high-overhead options such as SSRANGE and command protection in the development environment only.
Many shops make the decision to run without trace to reduce the measurable overhead that trace requires. This is a reasonable decision. In fact, it may be a requirement if the processor is full or the CICS is CPU-constrained, but it does have the negative side-effect of drastically reducing the possibility of resolving a storage violation. If your shop suffers from regular storage violations, it may be worth incurring the additional overhead of trace in production; again, activating full trace in test or development would add minimal overhead. When activating trace, be careful to set the size of the trace table large enough to actually capture the relevant data following an abend; remember that increasing the size of the table uses more virtual storage but doesn’t add to the CPU overhead.