A monitoring and alerting system is meant to provide notification of potential problems before they become catastrophic. A standard, consistent approach is required for monitoring WebSphere MQ (WMQ) for z/OS and alerting on potentially damaging events. For this to occur, the following questions must be answered:
- What objects are at risk?
- What are the failure scenarios?
- How is a failure occurrence identified?
- How can a failure occurrence be reported?
An effective monitoring and alerting system should also meet these criteria:
- Be reliable
- Not generate false alerts
- Notify, on a timely basis, the people who can resolve the problem.
Answering those questions and the additional questions they create, and designing and building a system that meets the above criteria will provide a full plan to monitor and alert. While this article is directed at the z/OS environment, the concepts are universally applicable.