z/OS systems have hundreds of thousands, sometimes millions of data sets. The concept of “cataloging” has been around for many years to assist in locating these data sets. In the past, cataloging was optional, but widespread use of system-managed storage requires the cataloging of all data sets under its control. The ICF catalog is where all data sets are cataloged, and access to data sets, whether from a batch application or an online system such as CICS or DB2, is possible only through a successful catalog search. If the catalog isn’t available, access to the data isn’t possible.
Figure 1 shows a layout of the ICF catalog and its associated metadata structures (VTOC and VTOCIX) for a single VSAM data set. Note that there are multiple records in the BCS, plus several more in the VVDS, VTOC, and VTOCIX on the volume where the data set resides. All these records, across all metadata structures, must be intact, with synchronized information, for the data set to be accessible.
The aspect that makes catalog failure a high risk factor is the ratio of data sets to ICF catalogs. Most installations, even the largest ones, typically have fewer than 100 catalogs on a system—usually about 25. Let’s assume they catalog 1,000,000 data sets. If the data sets were distributed evenly across the catalogs, on average, each catalog would catalog approximately 40,000 data sets. If you lose access to any one of those catalogs, for even an hour or two, you lose access to a considerable number of data sets (probably one or more entire applications).
If the number of data sets per catalog isn’t evenly distributed, the danger of a failure becomes even greater if one of the larger catalogs develops a problem.
Catalog Data Set Distribution
Why not have lots of catalogs, with cataloged data sets spread widely across them, to reduce the danger? Well, you can, and each installation has control over exactly how they design and set up this catalog and data set environment to work best for their particular environment. Unfortunately, there are factors working against this. The more catalogs you have, the more daily management is required to keep them clean and error-free. The more catalog backups you have to manage and keep track of, the greater the risk of a catalog failure. Too few catalogs result in catalogs that are too big, too unwieldy, and also prone to failure. There’s simply no magic number that works for all installations.
As a rule, the deciding factor is the number of “applications” that run on your mainframe. An application can be defined as a collection of related data files and their associated processing programs. As an example, human resources could be defined as an application, payroll, or customer data. Within an application, there might be hundreds or thousands of data sets, and usually, all data sets within an application will be cataloged in the same catalog.
Here’s how it works. The path to locating the catalog is a technique called “alias match,” where the alias is a value that represents the application. The catalog is defined to the system as having an alias of that value and that same alias is assigned to the high-level node for all data sets in that application (see Figure 2). To locate any data set within that application, an alias table is searched to identify its assigned catalog, and then the catalog is searched for the fully qualified data set name.
The problem is that most catalogs have multiple aliases, representing multiple applications, whose data sets are cataloged in the same catalog. It isn’t uncommon for a production catalog to have 50, 75, or even 100 or more aliases assigned. Over time, new aliases are assigned to existing catalogs and the number of data sets for an application grows. Before you know it, a critical catalog has bulged far out of proportion. If the catalog has an unplanned outage for any reason, many applications will lose access to their data sets until the outage is corrected.