Apr 27 ’12

Rules of the Road for Improving Mainframe Performance and Availability While Reducing Costs

by Nick Pachnos in z/Journal

Experienced drivers know driving conditions aren’t static and they must adjust their speed according to the road, traffic, and weather conditions. You wouldn’t drive your car in a rainstorm or snowstorm the same way you would on a clear, sunny day. Nor would you drive in heavy urban traffic at the same speed as an open highway.

Mainframe performance also changes based on various conditions. That’s why it’s important to pay more attention to ensuring that performance and availability can keep pace with current business demands.

New Mainframe Challenges

Consider some examples of how today’s dynamic business environment has created a new operating roadmap for mainframe performance and availability.

Non-Stop Business

Historically, mainframe performance focused on heavy transaction volumes during peak business intervals. Most people did their banking during business hours. Transactions peaked in the late morning, dipped during lunch, and then peaked again in the afternoon. Such peak business intervals were common for most businesses; a graph of this activity looked like the “double arches” in the McDonald’s logo.

Mobility and globalization changed the picture. Smartphones and tablets offer immediate access to the Web, email, or whatever data the user wants to access for work or leisure. As a result, users increasingly expect non-stop availability. 

Consumer expectations have transferred to the mainframe. They don't want to be told their application is down for maintenance or that they can’t perform a certain function in real-time because the mainframe data or application component isn’t available. To the user, it can seem incomprehensible that a social media application on an iPad is instantly accessible, but critical data for a multi-million dollar business transaction can’t be accessed in an IT environment.

Since almost all mainframe activity is on the back-end of mobility and Web-based applications, today’s mainframe, as a participant in the business service being delivered, is required to run non-stop. Performance and availability must keep up.

Availability and Performance Drive Business Value

Business users and customers need access to data for decision-making, transactions and other purposes, and businesses are becoming more demanding about how data is used. They’re constantly looking for new ways to differentiate themselves to stay competitive and data is increasingly part of those efforts. For example, if you go to your favorite online retailer, you may notice recommendations for items to buy based on your previous purchases. Immediate access to data makes such personalized cross-marketing possible.

So, availability and performance aren’t just about serving the user; they’re also about serving the business and helping it make profitable decisions. The business and its customers depend on mainframe data to make those types of intelligent decisions.

The Drive to Optimize Costs

According to Howard Rubin, founder and CEO of Rubin Worldwide and a pioneer of technology economics, if IT were a country, it’s estimated to have the fourth largest Gross Domestic Product (GDP) in the world ($4.5 trillion) behind the U.S., China, and Japan. The average company is estimated to have spent 3.5 percent of its revenue and 4.3 percent of its operating expenses on information technology in 2011. Fifty-seven percent of IT expenditures were for costs associated with infrastructure (source: Rubin International, “Technology Economics: The Economics of Computing—The Internal Combustion Mainframe,” Oct. 26, 2011). So, as necessary as IT is to the business, it’s also quite expensive.

Even though a large mainframe may run many applications and be shared across many different parts of the business, the mainframe often has the largest line items of expense. These big-ticket expenses make the mainframe a target for cost-cutting. IT must be concerned with making all platforms, including the mainframe, more cost-effective. So, IT must constantly optimize costs by juggling the priorities of increasing availability and performance while also lowering expenses.

The Risks of Haphazard Mainframe Management

Some organizations approach mainframe management haphazardly and that can create risks. Having improper management of performance thresholds, one of the basic building blocks for mainframe management, can lead to haphazard management practices for availability and performance. If you don’t stay on top of how you're setting performance thresholds—and how many alerts are firing on your current availability thresholds—there could be a tendency to ignore alerts that have no meaning. However, meaningless alerts just put traffic on the system and make it more difficult to identify significant alerts.

The presence of unnecessary alerts may signify there are real, underlying problems that aren’t being caught. A user or business-critical application could be suffering from performance or unavailability issues that will go undetected.

The business must constantly revisit thresholds, but some may have been set by a guru who is no longer there. That’s why it’s important to address these questions: Do you know why each threshold was set? Do you have a history of each metric, whether it aligned to a performance problem, why it was necessary, and whether it should be changed? It’s difficult to ensure thresholds are dynamic—attuned to business needs—if it becomes necessary to rely on institutional memory or guesswork.

Or, perhaps you know that some thresholds need to be adjusted to be more meaningful, but researching them keeps getting postponed. The right technology methodology can help. There may be more than 10,000 metrics for DB2 that could be set. So, how do you minimize IT resources while optimizing application availability? When you're dealing with 10,000 metrics, how do you know which ones are worthwhile?

An effective way is to organize these metrics into groupings that let you view data from a business perspective. It isn’t possible to set 10,000 metrics individually because it can cause you to miss some metrics and overstate others. What’s required is the capacity to pre-select metrics based on Key Performance Indicators (KPIs) and business needs.

False alarms are a nuisance, and if enough go off, someone might start clamoring to get them fixed. Likewise, if an alarm should have gone off due to a business-critical issue but didn’t, your organization would eventually be driven to zero in on that problem and fix it. Waiting for something to break isn’t advisable. Instead, initiate more proactive, systematic approaches to setting performance metrics.

How Technology Can Ease Mainframe Management

Identifying the causes of problems and who should fix them is still a challenge in many data centers. Software can help solve these issues, saving valuable time. Moreover, identifying the problem and necessary threshold can help avoid the next outage. Companies can’t afford even a short outage on critical data, especially considering that customers want continuous access.

Given increased business demands and mainframe skills challenges, IT must leverage technology to intelligently and dynamically establish thresholds for setting alerts and alarms. This technology could present the historic thresholds and help you determine whether thresholds were set correctly.

This capability is revolutionary and helps address the mainframe skills shortage challenge. With the retirement of many experienced mainframe technicians, even the experienced staff may not know how or why an alarm was set or be able to attest to its validity. These new solutions can help staff of all experience levels manage your system. They will let you set alarms at certain time intervals based on the behavior of the business while providing a more business-centric view to performance management.

The new solutions empower IT executives to align their businesses to an enterprisewide view and to business priorities. The solutions should let you set alerts and events in the mainframe from a business perspective and then send those events and alerts to a central repository.

Regarding availability, there’s a balance between performance and costs. With unlimited resources, you could buy full availability. Many vendors will suggest that you simply buy more hardware and increase redundancy. But is that cost-effective?

This is where software can help. Do you really need to replicate an entire environment, or can you use software to increase your availability? Most data center outages are caused by a systems or applications programmer, or another staff member creating a localized problem. If not caught promptly, the localized problem may spread, creating an entire systems outage. Or perhaps a business application that wasn't tested well enough goes online, causing all sorts of problems.

Unlimited redundancy won't fix problems with these kinds of localized causes. The software you use should help you recover from those types of problems and recover right up to the point where the problem occurred, if possible. Recovery and utilities tools can support maintenance while you’re up and running with strategic processing. With recovery software, you can recover from a localized problem so you lose as little strategic processing as possible.

It’s important to have an enterprisewide view of IT. Your monitoring software should help you increase performance and availability while lowering costs. The software should take advantage of the latest technology, using specialty processors to accelerate processing while lowering CPU consumption.

The Road Ahead

The mainframe endures because it’s one of the best repositories for strategic data in terms of security and access. With continued, rapid growth in both mobility and globalization, user demands will continue to grow. To keep up with these changes, IT must constantly revisit how the mainframe is performing in many measures, including availability and its support for current needs.

Looking ahead, we’ll see continued pressure to lower mainframe costs. The biggest challenge is determining how much can be cut before the business is affected. Some companies are even considering whether they can decrease their availability. Can they take an outage here or there and lower IT costs because their need for availability has changed? Such considerations aren’t likely to go anywhere because customer demands for continuous availability will only increase. Look for an approach that will let you run software that uses fewer resources so you can drive more business and pay less to run IT.


To view the sidebar, "Five Tips for Improving Mainframe Performance and Availability" by Nick Pachnos, click here.