Operating Systems

VSAM KSDS Data Set Reorganization

4 Pages

A question often asked regarding VSAM is: When should you reorganize a VSAM data set? Many recommendations related to VSAM reorganization can be misleading and confusing. Unnecessarily reorganizing VSAM data sets can waste resources, but not reorganizing VSAM data sets also can result in longer processing times that may represent lost resources. This article explores reasons for reorganizing a VSAM data set, specifically a KSDS (Key-Sequenced Data Set), and tries to provide guidelines to apply to this important decision process. Note that the reorganization guidelines presented here vary on whether the data set is processed online (e.g., CICS TS) or exclusively in batch. A distinction should be made between taking a CA (Control Area) split in an online environment where you have other transactions actively sharing the same address space or partition and response times are important, vs. a batch program where there’s only one “transaction” (the batch program) executing in the address space.

Many installations reorganize VSAM data sets on a scheduled basis such as once a week. The decision to perform the scheduled reorganization was made many years ago and the process is usually inherited. Sometimes, data sets that received no insertions or have no CI (Control Interval)/CA (Control Area) splits are routinely backed up, deleted, redefined, and reloaded. Unless there will be some definition changes (e.g., change the data or index CISZ), data sets that received no insertions or record expansions that may have caused CI/CA splits should only be backed up and not reorganized. In addition, a data set shouldn’t be reorganized simply because there are CI/CA splits; other factors should be considered.

Primary Reasons for Reorganization

A primary reason for reorganizing a VSAM data set is to correct some type of definition specification such as an unnecessary overallocation to recover wasted disk space or underallocation to reduce or consolidate extents. Recovery of wasted space is especially important if the data set doesn’t have any growth potential in the near future. Overallocations can be measured by comparing the High Used RBA (HURBA) to the High Allocated RBA (HARBA) from a LISTCAT. If the ((HURBA/HARBA)*100) is less than 70 percent, then this data set could be considered to be overallocated unless this is a highly volatile data set (one that receives many inserts). Such an evaluation should be made for data sets with 300 or more cylinders. Spending time to recover space from small data sets is a waste of time and resources. You can use a target savings guideline for considering data set reorganization due to overallocation; a useful guideline is recovery of around 100 or more cylinders.

Underallocated data sets also can be problematic. The target is to load a data set using the primary allocation assigned in one extent. However, there’s no guarantee the primary allocation will result in one extent, as the disk volume(s) may be fragmented—making it difficult to allocate the requested space in a contiguous extent. Also, some installations may use smaller primary allocations than required for the data set to avoid getting a no space found condition or a large primary allocation on the secondary volume due to insufficient room on the original volume to accommodate a secondary allocation. With SMS-managed data sets, the user can request a secondary allocation when switching to a secondary volume and avoid a primary allocation. The cost of the extent occurs when it’s obtained (allocated). Subsequent overhead processing of data sets with multiple extents is small and doesn’t justify data set reorganization unless the number of extents is approaching the limit, which varies on whether the data set is or isn’t controlled by SMS (123 extents for non-SMS managed data sets or 255 for SMS-managed data sets). The number of allowable extents varies and has changed over time with new releases of SMS.

You may want to correct the primary allocation if it was incorrectly calculated to eliminate unnecessary extents when the data set is reloaded in the future. One of the major causes for underallocation is the use of RECORDS to define the data set’s space requirements. The formula VSAM uses considers the average record length specified in the cluster definition to determine the space required for the data set. In practice, it may be difficult to determine the actual average record length specified in the definition along with the maximum length. The sad part is that VSAM doesn’t update or compute this amount as the data set is loaded or processed. So, the original specification remains with the data set. Also, when defining variable length data sets in the COBOL program FD (File Definition), many programmers are accustomed to specifying the minimum and maximum record lengths. Often, they use the values specified in the FD to define the space requirements in the DEFINE CLUSTER. Obviously, if this figure is too low, VSAM will incorrectly compute the space requirements for both the data and index components; this results in extra extents when the data set loads. So, overallocation or underallocation can be a reason for reorganizing a particular data set and this is usually done only once to correct the specified allocation. It’s usually done when the data set was next scheduled for reorganization unless the maximum number of extents had been reached or was close to being reached.

Other Reasons for Reorganization

Another reason for reorganizing a VSAM data set is to correct or improve a current definition that may be incorrectly specified or to improve the data set’s performance. These changes are usually not possible using an IDCAMS ALTER function. Some of these changes include changing the CISZ for the data or index or eliminating obsolete definition parameters such as IMBED/REPLICATE, KEYRANGES, WRITECHECK, ERASE or RECOVERY in the data component. Although the CI/CA free space can be ALTERed, changes associated with free space are usually made during reorganization. Except for the changes in free space, reorganizations resulting from definition changes are usually applied once to implement the particular change. Reorganizations resulting from free space changes may be repetitive until the proper or best free space percentages are found.

Reorganizations also may occur as a result of recovery of excessive disk space caused by CI/CA splits. Direct insertions that cause splits can result in a 50 percent free space being allocated in the original and new CI or CA. Often, this resulting free space may not be totally reused or usable. This means no inserts would use this space, which results in wasted space. You can detect this condition using a LISTCAT and observing the free space bytes. If this figure continues to increase, then the resulting free space isn’t being reused at the rate the free space is being created. Figure 1 provides information regarding a non- SMS data set that reached the maximum of 123 extents for a total allocation of 1,940 cylinders. The total free space bytes are 1,400,295,424 bytes or almost 1,900 cylinders. In other words, the data set has received a high number of CA splits in spite of the exceptionally high free space that’s available in the data set. This is an unusually high disk space allocation for a data set with only 10,397 records in it. The data set would easily fit into the primary allocation of 100 cylinders with plenty of room to spare.

Improvements in extent processing have helped delay reorganizations resulting from reaching the maximum extent limit. Specifically, IBM has implemented an extent consolidation logic that reflects fewer extents than the number of times a secondary allocation was requested for SMS-managed data sets. Unfortunately, this doesn’t work for non-SMS-managed data sets. When a new extent on the same volume is needed, VSAM checks to see if the new extent being allocated is contiguous to the last extent used. If so, VSAM adds the space, extending or changing the ending extent address without increasing the number of extents in the data set. This helps defer the need for reorganizations due to reaching the maximum extent limit even though the overhead for secondary processing remains. Finally, a data set that’s continuously acquiring extents should have its secondary allocation (and primary, if too small to load the entire data set) reviewed because the additional extents may indicate that the secondary request was too small. Raise the requested secondary amount to reduce extent processing. You don’t want to go to secondary allocation during a CA split in an online environment because it will adversely affect response times of transactions requiring access to the data set.

4 Pages