- Writes the Sequence Set Index CI with a split-in-progress indicator set
- Formats a new CA at the end of the data set (based on the High Used RBA value in the catalog)
- Moves (about) half of the CIs to the new CA — this requires reads and writes of each CI being moved and can amount to hundreds of I/O operations with small CI sizes (4,096 bytes and smaller)
- Creates a new Sequence Set (low-level) Index record for the new CA
- Removes the CIs, which were moved from the old CA
- Updates the Sequence Set (low-level) Index record to reflect removal of the CIs
- Updates the higher-level index records as needed.
As you can see, this process required many I/O operations to complete — hundreds when many small Data CIs exist in each CA. Larger Data CIs (and fewer Data CIs per CA) will reduce the cost of CA Splits.
A completed split, in contrast, has only a small impact on subsequent processing. A significant amount of data is still grouped into each of the (now two) CAs. Direct processing, the primary processing method for online systems, is not affected, and sequential processing will only be slightly impacted. Most important, however, is the positive effect on additional insert activity in the same key value vicinity. There are now two CAs, each approximately half full of free space. Many future inserts in this vicinity will at most require CI Splits — many inserts will be accommodated in this new free CI before another CA Split is required (see Figure 3).
REORGANIZATION MAY NOT BE RIGHT FOR YOUR FILE
The purpose of writing this article was to encourage you to examine your file reorganization strategy. If splits actually can help subsequent insert processing, you should not use the number of CI or CA Splits that have occurred as a trigger to cause reorganization, whether you do this manually or with a vendor VSAM enhancement product. VSAM file reorganization (or reloading the file) will:
- Squeeze out any additional free space that was created in the CIs and CAs of the file by split activity
- Move the records from CAs moved to the end of the file by CA splits (for example, CA 45 as illustrated in Figure 3) back into physical sequence
- Repopulate the file with the initial free space defined in the DEFINE CLUSTER command.
If your file has any clustering of insert activity, reorganization may do more harm than good. Future clustered inserts may cause splits again in the same places. You may be able to avoid some of these additional splits by performing frequent reorganization, but that was the statement of the problem — batch processing window problems.
HOW CAN I TELL IF I HAVE CLUSTERED INSERTS?
I can think of two ways:
- Ask the application owner, designer, or programmer about his key values and insert activity. This is somewhat less than perfect, as real systems often work differently than was assumed during their design.
- Check LISTCAT or other statistics that show the number of CI and CA Splits that have occurred. LISTCAT statistics are cumulative and you need to see how many new splits of each type have occurred each day. In the statistics for files with heavily clustered inserts, you will see the number of new splits start at a low level (depending on the amount of distributed free space), increase to a higher level, and decline over time.
If you track these statistics over a period of two or more reorganization cycles, you may see that the total number of splits performed increases after reorganization, compared with the number the file would have experienced if it had not been reorganized.
Too frequent reorganization may appear to reduce the number of splits required, but you need to trade this savings against:
- The batch cycle and online unavailability costs during reorganization
- Excess disk space used by unusable distributed free space throughout a file with clustered insert activity
- CPU, I/O, and other system resources expended in the reorganization.
In many cases, you will find you are reorganizing files that must be backed up, but that do not need to be reorganized. Retaining the free space you paid for through CI and CA Split processing will be a better plan if the file has clustered insertion activity. Eliminating unproductive reorganization processing can shorten the batch window and result in higher online availability. Eliminating unusable distributed free space throughout a file when insert activity is clustered can also save significant amounts of disk space.