Operating Systems

In recent months, many conversations with customers and fellow mainframe professionals have centered on performance management on the mainframe or, to be more precise, the dying art of performance management. Almost all these conversations have ended with the conclusion that something needs to be done to address this problem.

Before we do this, however, we should all agree that there are really two types of performance management people:

• Those who understand systems performance and deal mainly with the operating system, networking, storage, etc. 
• Those more specialized in application performance management who focus on application code, databases, middleware such as WebSphere MQ, and Transaction Processing (TP) monitors.

The two types differ significantly and require different skills, even though they both eventually come together for the user when they engage with a well-performing application. Though the process of solving a performance problem doesn’t differ greatly between the two groups, because they’re on different teams, communication problems arise that can also impact performance.

The following outlines the specific steps professionals should undertake to get a better handle on performance management.

Reactive vs. Proactive Approaches

The term “performance management” implies the existence of a process, and that some level of control is being exerted over that process. Too many companies deal reactively vs. proactively with issues—essentially solving performance problems vs. actually managing performance so issues don’t occur. Usually, performance problems are identified when users report them or some mainframe processes take much longer than they should and begin to wreak havoc. Because companies fail to truly manage the problem, different groups go off in different directions to try and solve one or more issues on their own. We all know this, and most of us will agree this isn’t the right thing to do. So, how can we shift from a reactive to proactive mode?

Start With Your Service Desk

Because so much these days is connected to cost, the best starting point in proactively addressing performance management is your service desk. This is the only place to identify which performance issues were reported in the past six months and which departments (both from the business and IT) were involved, how many users were affected, and how long it took to solve the problems. This action alone should provide a good indication of the impact and the associated costs of performance issues. Once you have data, you can rightly assume that, with proper management, 50 percent of these issues can be prevented. As with everything, preparation is key, and since you’re going to spend money, make sure your management understands the value that stands to be derived from the investment.

Use Your Toolbox

Only numbers tell the tale, and you should use the tools in your toolbox to come up with those numbers. It’s essential to have a tool that measures the response the end user experiences. Without it, you’re flying blind. Additionally, this type of tool should be able to help you identify the largest resource consumers on the mainframe. Determine what other performance management tools you have and how they integrate with each other.

Once you have your numbers, create a baseline. This is crucial because it will help you identify the components that behave differently from what you’ve defined as normal or acceptable. A baseline should focus, first and foremost, on the business transactions and the agreed-upon Service Level Agreements (SLAs). For this reason, it’s important that your tool can actually monitor and measure at that high level. But then your baseline needs to drill down to the single operation that might have impacted performance.

Assign Ownership

Each week, create a list of the top-10 transactions or processes that take longer than you’ve defined in your baseline and include those that show a significant change in behavior over the past x days. The latter information often provides an indication that something out of the ordinary is occurring and could eventually result in a bigger problem. If you’re lucky, some of these processes will overlap. For example, a poorly performing transaction can be tied to a performance problem with an IMS or DB2 database or a sudden peak in network traffic. Now the management part of performance management kicks in. You need an owner who can ensure every item on the list is entered into the service desk, so all activities can be tracked and, more important, management reporting is automated. Unless someone owns an action on the top-10 list, nothing will be resolved.

2 Pages