IT Management

Workload Manager:  Revisiting Goals Over Time

6 Pages
  • WLM goals set to unrealistic expectations
  • An overly easy response time or velocity goal
  • The use of average response time goals when percentile goals were more appropriate
  • The use of a response time goal when a velocity goal was more appropriate (or vice versa)
  • Work being assigned an improper importance relative to the other work in the system
  • Work being assigned a discretionary goal when it truly is not discretionary
  • Too much work in SYSSTC or wrong work in SYSSTC
  • Improper importance settings
  • Improper use of resource group minimums and maximums
  • Improper use of the storage or CPU critical controls
  • Incorrect setup of service classes, periods, period durations, etc.

Rather than going into detail about each of these reasons, let me give you just one example. One of the most common incorrect settings I see is an overly aggressive velocity goal. Without going into too much detail about velocity, what it is, what it means, etc., let me just remind you that velocity is a function of the using and delay samples collected by the WLM sampler. The CPU using and delay samples have a heavy influence on velocity, so high delays could cause low velocities. Low velocities are not necessarily bad, and, in fact, may be perfectly acceptable and expected. Let’s take a high-level look at the example shown in Figure 1.

If a service class period running on a five-way processor has 30 units of work, and each unit wants to use the CPU concurrently, then during any given sampling interval, at most only five dispatchable units could be found using the CPU, and the remaining 25 units could be found delayed for the CPU. This does not necessarily mean the work is performing poorly. It could mean that delay is inherent in the workload. Given the characteristics of the workload and the physical processor resource, the high delay for CPU is expected. In this example, because delay is inherent in the workload, a velocity of 60 may be too aggressive, and a velocity of 10 may be more realistic.

SCENARIO 2: LATELY, GOALS ARE REGULARLY BEING MISSED

Your migration to WLM goal mode was successful, and you finally overcame your improperly set goals and service definition controls. Everything has been running well, but now some time has passed. Lately, WLM seems to be managing the system and the workloads a bit differently than it did previously. Why is it that some goals that used to be met regularly are now being missed? Why is it that over time WLM manages the work differently, or the results of WLM management are different? Why do service definitions and goals need to be revisited regularly?

The answer to this is simple: Over time, the system, workloads, applications, and even the users change. Given a finite amount of resources, if a workload grows or changes, then it could result in WLM controls that are no longer appropriate for the current environment.

As with the previous scenario, this scenario has many possible causes:

  • Growth in an existing workload or application
  • Growth in SYSSTC, system address spaces, and/or monitors
  • Changes in the capacity or configuration of the hardware
  • Server/image consolidation
  • Changes in software product levels or applications
  • Growth in system address spaces
  • A reduction in a workload
  • Introduction of a new workload.

Allow me to elaborate on one simple cause: growth in an existing workload, which may cause that workload to consume more system resources. On systems where resources are plentiful, growth of a workload may not impact the performance of the other workloads.

On systems with a shortage of resources, WLM tries to ensure that the resources are allocated to the highest importance workloads as needed. If a workload that has grown requires more resources, and is assigned a higher importance level than other workloads, then WLM may decide to take the required resources away from work at lower importance to give to the grown workload. In the past, these lower importance workloads may have had no problem achieving their objectives, but with this new distribution of resources, those same, unchanged, lower importance workloads may now miss their goals.

The usual indicator that this scenario is occurring is that you will start seeing higher Performance Indexes for the lower importance workloads with an increase in transaction or resource consumption by the corresponding higher importance work.

6 Pages