Operating Systems

Virtually all z/OS mainframe data centers have a classic Disaster Recovery (DR) plan of one kind or another to help survive an outage of the overall data center. The DR plan typically has several staff assigned and the plan is tested with elaborate scenarios at least annually. Some installations test portions of their DR plan as often as every other week. For many sites, the costs for “hot site” support can run into the millions of dollars.  

The odds are low that anyone will ever declare an actual disaster. Nevertheless, staff goes through the motions, albeit with a lot of grumbling, and success or failure is declared for each test.  

With today’s emphasis on cost cutting, why is so much effort, time, and money devoted to a process that everyone hopes will never be actually activated? Senior management, often at the CIO level, know their fiduciary responsibility to have a DR plan in place and that they’re accountable if a disaster occurs with no DR plan in place.  

It’s no accident that the term  disaster recovery has changed in recent years to business continuity , because that’s what it provides—the continuity of your business when an outage of your entire data center occurs.

ICF Catalog Risk Factors

A serious data outage is waiting in the wings at many installations, yet if anyone is even aware of the danger of this outage, it’s simply ignored. This potentially devastating outage is the failure of an ICF catalog.  

 Typically, the technical staff says that “catalogs never break; we haven’t had a catalog failure in five years,” or some other excuse that keeps them covered. When they do admit to a catalog failure, the attitude is that this type of failure is rare, and that a recovery time of several hours is perfectly acceptable. Because this type of outage isn’t likely to gain outside notoriety, it’s often swept under the carpet as “one of those pesky computer glitches.” Nevertheless, when an ICF catalog outage occurs, it can result in millions of dollars of lost revenue, and failure of customer service level agreements. 

When technical staff are asked if they have recovery procedures in place and thoroughly tested, they often roll their eyes, flash smiles, and admit that, no, they have never really tested for recovery of a catalog. If they have tested, it typically involves restoring their catalogs during the initial process of setting up for their DR test. Or, worse, they tested 15 years ago when they purchased a software tool for catalog recovery, but haven’t touched it since.  

At most installations, technical teams already have their hands full with daily problems and processes. Since ICF catalog failures aren’t an everyday occurrence,  planning for a catalog failure isn’t a priority task. The other reality is that management hasn’t made it a priority to create and maintain an ICF catalog recovery plan, nor have they been given an established requirement to regularly test it.  

More to the point, it’s a safe bet senior management has never heard of ICF catalogs, and isn’t aware of the acute risk to data access failure that results from an ICF catalog outage. Without that awareness, there’s no drive to implement such a plan. 

4 Pages