Software enables us to implement automation, which makes it possible to quickly respond to alerts and problems, freeing up technical staff to do more complex, strategic tasks. Automation helps IT move from firefighting mode to proactive management. With the adoption of automated alert processing and REXX programming standards, the problem resolution forecast in your z/OS data center is for clear skies. This article demonstrates how to define standards and then use the REXX programming language to set up the automation.
Automation Alert Processing Standards
Creating automation alerts is a basic, primary goal of any data center. These alerts can occur as a result of a successful or failed process, and may be generated as informational alerts to a mainframe console. Most shops have a significant number of automation rules that scan millions of lines of z/OS message traffic, firing only on the events that have been determined to warrant a warning based on data center needs.
Automation standards can be a huge asset and provide a way to ensure consistency in how alerts are presented, their clarity, and the ability for IT staff to identify their coding relationships. They significantly reduce the challenge in responding to events and alerts in a timely fashion. Here are some recommendations:
When setting up alert thresholds, you can define rules based on your experience and expertise, minimizing the number of problems you must handle manually. But this isn’t enough because you aren’t the only person interacting with the system, solving problems, and managing your data center. When developing standards, you need to take into account all the operational requirements to support users.
Generating a unique message prefix, such as “OPSNTFYxxx,” for every message issued lets you create a common look and feel, which is of paramount importance for this type of standardization. Many software products use a specific message suffix or prefix range to identify messages, such as DFH for CICS, your unique automation message should, too.
For example, you might decide to use “OPSNTFYxxx” and then define “xxx” as noted in Figure 1. Then, any administrator would immediately know which component was involved and the right person could easily be directed to handle the problem. The following shows an example of what these types of automation alert messages look like:
OPSNTFY100 Job 46343 (RUN123) exceeding defined class C init threshold. Job cancelled.
OPSNTFY055 You are not authorized to stop production controlled tasks.