Feb 15 ’11
Pete Clark on z/VSE: A Disaster Waiting to Happen?
As IT professionals, we certainly don’t need more challenges. However, for purposes of this discussion, this challenge isn’t something you’re doing; rather, it’s probably something you aren’t doing, have forgotten about, have lost interest in, or even something you think is going away.
(Note: For this discussion, “we” and “you” are a group reference meaning you, your management, and the company management.)
What’s this challenge? It’s that platform (hardware, operating system, or software) that was stabilized 5 to 15 years ago. If the platform was completely replaced, it will never present a problem. End of discussion. However, in some cases, these platforms are functional, stabilized, and retained as historical data and must be available for use. As time passes, however, knowledge and support for this platform dwindle. The platform still contains important, non-redundant, company data needed for customer service and regulatory requirements.
In other cases, the platform is deemed important, remains in service with minimal part-time support and enhancements, and will be migrated at some later date. However, if it’s already been five or 10 years, are you really ever going to get to it? Not likely.
But what if the platform is deemed important, remains in service, is considered mature and has a token support team assigned? What if this support team doesn’t have the experience or expertise necessary to fully support the platform?
It’s common practice that the most productive, experienced personnel are assigned to new platforms; this just makes good business sense. However, over time, the skills they used to support earlier platforms will atrophy and be of limited value. Not to mention that these folks will leave the company, taking their skills with them, reducing the skill base for all platforms.
The bottom line is that stabilized, mature platforms and long-term migrations over time suffer from the loss of skilled professionals who know how to fix or mitigate problems. To compound matters, these older platforms receive little to no company focus or attention, so no one realizes there isn’t anyone capable of fixing a problem until some dramatic event occurs.
In almost every case I’ve seen, that dramatic event has been a major platform hardware failure. Although modern computer hardware has a lengthy life, these are mechanical and electronic systems, and given enough time and neglect, they will fail. Often, when the platform is stabilized from a software perspective, hardware and maintenance suffer the same fate. Does anyone see a self-fulfilling prophecy here?
Unfortunately, we saw numerous incidents of this at client sites in 2010, and it appears to be a growing concern. Here’s a partial list of situations (from many different failures) we saw. These should prompt you to re-examine your support and service for these platforms:
- No one within the company knew how to do anything other than boot the system.
- No one had the skills to determine if they had a hardware of software failure, and no one knew who to call and notify of the failure.
- The system was down for a week before anyone realized it.
- A string of drives had a controller failure; no one knew when this happened.
- No one knew if backups were being taken or where they might be kept.
- Backups were taken and then used as scratch tapes.
- The most current backup was 10 years old.
- No current employee knew how to do a restore.
- No one knew who or how many persons were actually using the system.
- Tried to contact previous employee for help after crash—employee had passed away; contacted prior employee—employee refused to help.
- Hardware failure—repair parts were no longer available.
- Software company no longer in business.
Scary isn’t it? How can you avoid these kinds of disasters? First, check older platforms for backups, skillset, hardware maintenance, security codes, etc.
Thanks for reading this column; see you all in the next issue.
Pete Clark works for CPR Systems and has spent more than 40 years working with VSE/ESA in education, operations, programming, and technical support positions. It is his privilege to be associated with the best operating system, best users, and best support people in the computer industry.