Operations

IT faces a significant challenge in protecting the mainframe ecosystem to continuously ensure business continuity in a highly complex, open environment. 

This article describes how intelligent automation can augment existing Disaster Recovery (DR) mechanisms and meet the challenge of ensuring business continuity in today’s complex environment. It focuses on the use of intelligent automation in the context of Business Service Management (BSM) to address the risk to business operations and continuity posed by events other than disasters or hardware failures. 

BSM is the most effective approach for managing IT from the perspective of the business. It ensures that everything IT does is prioritized according to business impact, enabling IT to proactively address business requirements to lower costs, drive revenue, and mitigate risk. 

The 80/20 Rule Applies  

People typically associate threats to business continuity with natural disasters such as hurricanes, floods or terrorist attacks, but human and application errors—which cause data loss, data contamination, resource outages, and degraded performance—can be equally disastrous. Industry experience has shown that approximately 80 percent of all unplanned downtime is the result of software problems or human error. 

While many enterprises have established adequate protection against disasters and hardware failures, they also must protect the mainframe, where most information still resides, from the increased risk of human and application errors. Such errors are occurring more frequently because: 

  • IT infrastructure is increasingly more complex with more software dependencies
  • Enterprises depend on the mainframe to support mission-critical and core business application for large numbers of centralized and distributed users.
  • Operations staff now typically must perform mainframe maintenance while systems are running, opening the door to errors.
  • Many mainframe applications are Web-enabled, opening up previously controlled environments to a much less controlled atmosphere.  

Human errors can wipe out critical data just as thoroughly as a natural disaster or hardware failure. For example, an application error can cause erroneous data to be added to enterprise databases. A delay in responding to a spike in workload (easily caused by access from the Web) can decrease the performance of a strategic application.

 The increasingly complex mainframe environment increases the likelihood that people will make operational errors, which could result in data loss. Complexity also increases the probability of coding errors in applications, which can result in contamination of strategic databases. Complexity also makes it far more difficult for staff to quickly and accurately react to changing conditions in the IT environment, which can result in performance degradation. 

Contamination that compromises the integrity of a critical database can go undetected. For example, in updating a database, an administrator makes a single keystroke error in a batch update job that causes the update to be performed with the wrong input data set, contaminating critical data. The applications that rely on this data continue to operate, but with bad data. The problem may go undetected until reported by end users.

 Traditional Methods Prove Insufficient  

3 Pages