The mainframe has most certainly defied the predictions of industry pundits; in fact, in the past 10 years, most mainframe shops have grown their MIPS capacity by 15 to 25 percent per year. This increase reflects both more work and new kinds of work. The “new workloads” may be z/OS transactions arriving from outboard distributed servers or UNIX applications now running under Linux on System z on Integrated Facility for Linux (IFL) processors. The result is a significantly larger mainframe presence than the pundits anticipated.
Systems management strengths, disciplines and tooling that have grown up with the mainframe, have contributed to its resilience. Systems management encompasses configuration management, change management, problem management, performance management, and capacity management. A successful IT organization must formalize these practices through documented processes covering who does what, when, and how. Clearly defined processes provide a consistent methodology for how this work is performed, even when personnel change, and ensures the organization’s high-level goals reflect the way systems are managed.
Systems management methods vary widely. Capacity management (i.e., capacity planning) stands out as the least formalized, most problematic practice, and unfortunately, lags the other disciplines; this article describes capacity planning best practices.
The Capacity Planner’s Role
Despite capacity planning’s importance, most shops don’t have people dedicated to it. Often, it’s considered a part-time function, assigned to the same people doing performance management. Capacity planning is less well-defined than performance management, and many IT shops don’t clearly define what they want capacity planners to do. However, effective capacity planning includes knowing how to determine when the system is out of capacity, what reports to produce, and how often.
The capacity planner must tell upper management when the company’s IT systems will no longer be able to provide acceptable service to end users. Since this is a planning function, it’s assumed this information will be made available before the event occurs. This job includes many sub-functions that support this kind of analysis, including workload tracking and trending, setting and monitoring Service Level Agreements (SLAs), and evaluating different upgrade scenarios.
Defining Out of Capacity
Capacity planning requires a clearly defined and agreed upon definition of what it means to be out of capacity, though this is often overlooked. Without a clear definition, how can the capacity planner develop a plan acceptable to management? The definition must be agreed to by all parties and must be the trigger that requires action to be taken.
Consider some of the out of capacity definitions we’ve heard from IT managers:
- “I don’t want performance to be any worse than it is today.”
- “I don’t want the phone to ring and the caller is someone I care about.”
- “We wait until the pain affects our profitability.”
These definitions share a lack of formality and a clear linkage between high-level business goals and how IT resources should be managed. Today, many shops will say they have SLAs in place, but these are often negotiated between the operations staff and users. If SLAs are missed, a meeting often occurs to address the problem. The result might be a plan to look into tuning the system or application, but missing the SLA is a long way off from requiring a processor upgrade.