Operating Systems

The next step is to learn the business. IT needs to understand how the company makes money and how the systems it supports contribute to that revenue. We’re all familiar with the cost of IT, but we also should know what each transaction costs relative to the business. When IT understands how the business makes money, it can quickly start prioritizing applications such as credit card authorizations or stock transactions ahead of less strategic applications that nevertheless have strong internal advocates. “Loved ones” may have little to do with profitability; they’re more likely the favorite application of an internal fiefdom. Business and IT are natural allies even though language and organizational barriers can make this challenging.

Understanding the business also means IT staff knows what each application means in terms of a business function. IT staff members’ understanding of applications often go no further than the name of a batch stream, a CICS region, or a UNIX process. This simply isn’t a viable condition. IT can’t effectively manage capacity if it doesn’t understand how real-world events can potentially impact its systems and its network.

If possible, every IT organization should map out business transactions to see how they traverse servers and networks. Network and systems monitoring tools should be able to help crystallize these definitions in the way monitoring data is viewed. This makes it easier to understand the impact of a problem and helps translate a user’s concern to related factors in the underlying system. Understanding that CICSREGA is called by SUNSRVRB and accesses a DB2 database makes it easier to solve problems when both the technician and user know that this arcane technology description is actually a loan application system.

Not every transaction or process is created equal. Looking at an application as a whole may give the impression that things are fine, or that things are worse than they actually are, for the user. IT staff needs to understand the application well enough to separate foreground and background workloads and focus attention on the online work such as the kinds of transactions a user must wait on. The prerequisite for this is to understand the business and how users interact with an application.

An effective service-level management team also needs to include application programmers and architects. It may help to watch a user in action; one observation can teach IT people a great deal about what they’ll see later on their monitors. At one company, users did all their print work at 10 a.m. based on an erroneous idea about the best way to interact with the system. This caused delays in online transaction processing, which was remedied when the users were simply advised to print when they wanted to.

IT should know and talk to system users. When IT understands what business users do and how they interact with the system, the metrics make more sense. Such interaction also reveals much about what bothers users the most and what’s insignificant. Users can even drive tuning exercises. By making users, rather than systems, more efficient, corporate profitability improves. Network and system metrics have meaning only in the context of a person’s experience with them.

Once IT has gathered this intelligence, this knowledge can be incorporated into existing network and systems management software. Resources can be grouped and named as components of critical applications. Labels also can reflect the importance of applications. By taking the time to customize monitoring software in this way, IT can visually determine the impact of any problems that occur.

Another useful area to customize is advanced alerting techniques. IT can improve productivity by reducing false alerts and focusing attention on issues that are meaningful. Baseline alerting determines the normal utilization or performance for applications, then highlights and alerts staff when monitored performance metrics deviate from the norm. Such alerts provide the operations center and systems programmer or network administrator with information they need to better understand the health of their sphere of management. Alerts also can be based on standard deviations from the norm. Sudden changes that fall more than one or x numbers of standard deviations are highlighted. Alerts can flag many issues of concerns such as excessive system utilization, fewer jobs than expected, low transaction or message rates, or less network traffic than expected.

Complex argument alerts let staff receive alerts only in the event of multiple, correlated problem conditions that simultaneously occur. Systems programmers, for example, may want to be alerted when batch jobs with a certain service class run longer than five minutes. Complex alerting arguments enable staff to optimize their productivity— keeping them on task and by informing them of problem conditions that require human intervention.

Business-centric application performance management is the key to business success. This transforms the job of performance management from a simple focus on IT resources by silo into a holistic view that requires a clear understanding of a business transaction. Only by working closely with users and developers and then incorporating that knowledge into their management practices can a systems or network management specialist hope to achieve true service-level management. The investment in time is small compared to the enhanced ability to better manage essential IT resources. ME

3 Pages