Jul 17 ’12
Mainframe Analytics: Getting Good Performance at a Good Price
Extremely high performance is easy to achieve if you’re prepared to spend a lot of money. Unfortunately, few IT groups can do that. A driver can get extremely high performance by purchasing, say, a Maserati GranTurismo. However, most drivers can’t afford such a car and must be generally satisfied with less performance. They want good performance at a good price; so do enterprise executives. IT is evolving toward a better balance between performance and price.
How Analytics Help
Analytics help lower the cost of delivering better performance by automating repetitive, labor-intensive monitoring tasks. Many years ago, performance analytics on the mainframe existed only in the mind of the systems programmer or administrator, who theoretically was constantly monitoring system performance. For example, a CICS technician would watch an application and keep pressing the ENTER key to gauge response time. The technician would observe how closely the response was approaching some threshold. Typically, the threshold would exist only in the technician’s mind, not in an actual Service Level Agreement (SLA).
A lot of interaction between IT personnel and the system was required to develop a mental trend line of what was happening in an application. Unfortunately, this method could rarely predict problems before they manifested. And no one had time to keep pressing the ENTER key all day.
The software industry ultimately produced tools that enabled a more efficient process: management by exception, which involved systems monitoring and automatic alerting when certain thresholds were breached. The alerts prompted the attention of someone who could then isolate the problem.
Analytics, a capability built on top of that process, gives IT the insight and knowledge needed to preempt a problem. It observes current situations, correlates events, predicts the future, and surfaces higher-order alerts or exceptions that identify and often solve problems before they fully manifest themselves.
Analytics in Action
Analytics allows a correlation of two or more variables. Suppose you’re responsible for providing a certain level of service in the form of transactions per second with a specified response time. You look at response time and transaction rates; both are good. They’re not too high and they’re not trending in a wrong direction.
But if you look at both measurements simultaneously, you may see that performance is outstanding. It might even be too good—simply because the transaction rate is too low. So something’s going wrong—something you weren’t able to see without a side-by-side analysis of transaction rates and performance.
Some companies that use analytics can detect that a problem in an application actually exists on the distributed side of the house because they recognize the rate of communication with the mainframe is lower than it should be, even though response time is perfect.
Two Real-World Examples
A large insurance company uses analytics to continuously maintain high performance in its mainframe applications. The company automatically monitors Key Performance Indicators (KPIs) on the mainframe, aggregates those indicators into workloads, and creates an application-level view of how the business is operating. Using analytics software, IT personnel can see application performance trends over time—at a high level and close to the business. They’re in a position to predict problems before they affect the business, drill down to causes, and prevent the problems.
A large financial services organization in Europe actively uses analytics to ensure mainframe availability and performance. IT personnel deploy analytics tools to monitor applications performance and perform analytics at the application and SQL levels. Then they can change individual SQL statements and create permanent application performance improvements.
Application Performance, Not Sub-System Performance
Performance analytics isn’t the measurement of the performance of IT or particular sub-systems on the mainframe or distributed platforms; it’s the process of understanding the performance characteristics of applications that support the business.
When performance is reduced or a service isn’t available, it can impact customer satisfaction. If it takes too long to complete a transaction, a customer might visit another company to purchase a product. Or, if it impacts an Automated Teller Machine (ATM), and a customer can’t access funds, this jeopardizes the customer experience. Failing to watch the right KPIs can cause a business to lose revenue or incur higher expenses to recover from problems.
There are thousands of KPIs; no IT group could measure them all. But if you’re trying to tie the performance of IT and IT infrastructure to business users and customers, you’re really interested in those metrics most closely aligned to the business and business applications—especially response time, latency, and resource consumption. Although monitoring and correlating these measurements doesn’t directly solve a problem for you, it does help you recognize if there’s a problem at the boundary between the business and IT.
• Response time: Overall response time as experienced by users. User Application Performance Monitoring (APM) is the best way to determine what performance and service levels the customers are experiencing.
• Latency: From an applications perspective, the latency between nodes in an application (back-end, mainframe, or distributed). You can acquire this measurement at the network or system level.
• Resource consumption: Indicators of resource consumption at the business level and application level. These are high-level KPIs across a spectrum of platforms.
Those are the measurements that provide the top-level appreciation of the relationship between IT management and the business.
Most of the thousands of other KPIs are lower-level such as buffer hit ratios for DB2 and system delays caused by resource collisions and unmet resource requirements. These lower-level KPIs aren’t appropriate for the business level; they help you isolate the specific IT technical problems that underlie the poor performance you detect at the application or business level.
How Analytics Reduces Costs
Mainframe analytics software can simplify your analytics work, improving performance and availability and reducing costs. It does two things:
• It surfaces, at the application level or the business-user level, the performance analytics results that can directly affect the business. For example, it can determine that a trend may soon cause an application or business service to go yellow.
• It determines what you can do about that. In some cases, it can do it for you; for example, some software can solve or prevent performance problems in certain instances.
From the perspective of the user of a financial service, the problem may be an inaccessible Web page or service—or just slow access. But how does that problem relate to an underlying IT process? Analytics software can identify that relationship and often prevent the performance from falling to the point it’s noticeable.
Optimizing performance can be expensive. If you think about the knobs you can turn, you can turn the service level up, make it unparalleled, provide 100 percent availability, and exceed every SLA you can imagine having—but the cost would be unreasonably high.
Most enterprise executives are expected to maintain or optimize performance and reduce costs simultaneously. That’s the game of analytics: It’s a game of sampling and of trying not to become part of the problem while you’re trying to measure the performance. It’s a game of ensuring you can optimize application performance and reduce costs. It’s a serious game with real-world implications.
When planning an analytics program, reassessing an existing analytics program, or considering the acquisition of analytics software, keep these three guidelines in mind:
1. Focus on the right metrics: By definition, they’re the metrics that most closely match business objectives. Analytics is measuring KPIs against time and against each other. Those KPIs that relate most closely to the business (e.g., response time, latency, and resource consumption) are the most important to get right and analyze correctly. That requires a business and application perspective. Sometimes, the more technical IT personnel don’t think in terms of business applications. They think in terms of threads, transactions, and units of technical work. To use analytics effectively, the technical people must begin to be aware of the purpose of the application and the service levels required.
2. Cover both high-level and low-level metrics: Your high-level metrics are really close to the business and its performance. To actually solve business application performance problems, you also need access to lower-level detail: the technical KPIs that compose the performance indicator. If you see that you’re trending toward a service problem at the business or application level, you need the lower-level capabilities to isolate those problems before they happen. Analytics can help IT personnel use the technical KPIs to help solve and prevent business-level performance problems.
3. Take a broad view: System, application and business performance aren’t confined to one subsystem or server in the broad technology stack. They require a more holistic view across systems—both mainframe and distributed—to relate problems to a specific application or business and correlate problems across the silos that form the application suite or business service.
The highest-performing sports car is wonderful but impractical for the average driver. It can deliver extreme performance, but at a price beyond what the average driver can afford. Information technology can also deliver extreme performance, but sometimes at too high a price. To meet the business needs of the enterprise, the IT group should constantly strive to balance performance and cost. Mainframe analytics is a tool that helps you achieve and maintain that balance.