In this scenario, all seems to be running great in goal mode. The users are happy, your manager is happy, and life is good. However, when you look at your performance monitors, it sometimes appears goals are not being met. Or, maybe the measurements don’t even make sense to you. Could the measurements be “wrong”?
Why is it that sometimes the measurements don’t reflect reality? Could (and should) changes be made to the WLM Service Definition to make the measurements more accurate?
First, please let me clarify the title I’ve given to this scenario. When I refer to “inaccuracy” of measurements, I don’t mean to imply that the measurements are intentionally wrong and are just not being fixed by the vendors of your performance monitors. What I am referring to is that sometimes the measurements reported by monitors don’t always reflect reality. This, in turn, can lead to some misunderstanding and interpretation of the measurements. Without understanding the internals of WLM, this is a difficult scenario to comprehend. Let it suffice to say that WLM’s management of certain workloads causes it to manage workloads outside the defined WLM controls that you’ve set. This, in turn, sometimes results in the reports being misleading or difficult to understand.
The most common cases I see for this scenario include:
- WLM management of CICS or IMS workloads toward transaction goals
- Improper setup of CICS and IMS transaction goals
- WLM management of exploiters of enclaves (such as WebSphere, Stored Procedures, DDF)
- Mixing of unlike work in the same period
- What I term as “participant address spaces” being classified to SYSSTC
- Short, response time workloads mixed into the same periods as long-running address spaces or enclaves.
SCENARIO 8: PLANS TO EXPLOIT NEW NON-WLM FUNCTIONS THAT WILL AFFECT PERFORMANCE
All is running great in goal mode, but now your installation is planning to take advantage of some new non-WLM functions that may affect the performance of the workloads. Not only that, but some of these changes may affect the software bills.
How do we manage what may be conflicting objectives? Do the new non- WLM functions lead us to consider making changes to our WLM definitions? Is there a connection between pricing and WLM?
Examples of some such facilities include:
- Intelligent Resource Director (IRD)
- On/Off Capacity on Demand (COD)
- Workload License Charges (WLC).
Al Sherkow, of I/S Management Strategies, Ltd., has previously written for z/Journal on the subjects of IRD and WLC. Visit the z/Journal Website at www.zjournal.com to view these articles, or visit Al’s Website at www.sherkow.com for some great presentations and papers on these subjects.
As Al has pointed out, each of the aforementioned items could affect system capacity. As I mentioned previously, if the capacity of the processors changes, then it is entirely possible for the workloads to achieve different velocities or transaction response times. With IRD and WLC, we now have facilities that may change the system capacity multiple times, dynamically, throughout the day.
How will these affect the achievement of goals and will goals need to be modified? If a partition has been capped due to IRD or WLC, we would expect to see the impact first on the lower importance workloads since they are the first ones sacrificed when resources become scarce.
Al Sherkow has developed a new and interesting concept of “expendable MSUs.” Al realized that when a partition is capped due to an LPAR’s defined capacity being exceeded, additional capacity is taken first from the CPU of the lower importance workloads. Al suggests that if you want to realize further savings in your software bill by pushing the defined capacity value even lower, you can do so by first understanding, and then accepting, the impact to the lower importance workloads. Only you know which workloads are expendable, and you will need to make sure your WLM importance levels are set appropriately.
It is still premature to garner the full implications of WLC, IRD, and COD on WLM goals, since much still needs to be learned about their interactions.
SCENARIO 9: OCCASIONALLY “SOMETHING STRANGE HAPPENS” OR “DOESN’T HAPPEN”
In this scenario, all is running great in goal mode, but sometimes the system just acts abnormally. It is hard to articulate, but it appears that WLM “burps,” or does not manage the workloads as expected. You have read the WLM books and talked to the WLM experts, but still you cannot understand why WLM is managing the workload as it is. There is usually no clear explanation.
What could be happening? Is there a problem or “hole” in WLM? Should changes be made to the WLM Service Definition to avoid these anomalies?
The typical cases for this scenario vary greatly, but usually happen during the following times:
- When trying out a new facility
- When implementing a new or changed workload
- During a system ABEND or dump
- When the system is under a lot of stress.
Examples of cases I’ve see include:
- Unexpected dispatching priorities
- Resource group minimums or maximums are not honored
- High WLM overhead
- Goals are not met but no other scenario applies
- Importance levels seem to be ignored.
I’ve studied many of these cases, and no two are the same. However, you will know you are experiencing such a scenario when it occurs. When this occurs, don’t despair. My suggestion is to post a question to a z/OS performance-oriented list server, contact IBM service, or even contact me. I enjoy looking at these cases, since they teach me more about WLM. Remember, WLM is software, and it is designed not to do anything “stupid.” Having worked closely with the designers of WLM, I can tell you they are toptier, and they’ve put a great deal of thought into many different cases and scenarios. However, this does not mean that WLM is perfect.
But it is pretty neat ... isn’t it?
WLM is one of the most interesting areas of the z/OS operating system in which you can become involved. So, good luck, and have fun. Z