Aug 2 ’10
CICS Sysplex Optimized Workload Routing
The Workload Manager (WLM) feature of CICSPlex System Manager is a useful tool for optimizing system capacity in highly complex environments. This tool analyzes the load capacity and health state of CICS regions intended to be targets of dynamic transaction routing requests and selects the region it considers the most appropriate target. CICS Transaction Server for z/OS Version 4.1 introduces a new feature of CICSPlex SM named Sysplex Optimized Workload Routing. This subfunction of the existing WLM feature was implemented in response to concerns voiced by many large enterprise customers regarding the observed behavior of WLM in CICSplexes that span multiple Logical Partitions (LPARs).
Existing WLM Decision Behavior
Let’s consider the current WLM decision behavior. WLM employs data spaces owned by a CICS Managing Address Space (CMAS) to share cross-region load and status data. Every CMAS owns a single WLM data space it shares with all user CICS regions it directly manages. A user region managed by a CMAS is known to CICSPlex SM as Local Managed Address Space, or LMAS. During CMAS initialization, that area is verified and formatted with the structures necessary to describe all workload activity related to the CMAS. When the user CICS regions begin routing dynamic traffic, the state of those CICS regions is recorded in this data space.
In a CICSplex where the same CMAS manages all dynamic routing CICS regions, all those regions use the same WLM data space to determine workload information required for WLM operation. That means dynamic routing decisions are made based on the most current load data for a potential routing target region. A routing decision is based on an amalgamation of factors:
- How busy is the region?
- How healthy is the region?
- How fast is the link between the router and target?
- Are there outstanding CICSPlex SM Realtime Analysis (RTA) events associated with the workload?
- Are there transaction affinities outstanding to override the dynamic routing decision?
This processing rationale provides equitable dynamic routing decisions when working in a single CMAS environment. However, with workloads being spread across multiple z/OS images, users must configure additional CMASs to manage the user CICS regions on the disparate LPARs. Each WLM data space must maintain a complete set of structures to describe every CICS region in the workload—not just the CICS regions that each CMAS is responsible for, but also those regions in other LPARs managed by other CMASs.
This means the WLM data space each CMAS owns must be synchronized periodically with the WLM data spaces owned by other CMASs participating in the same workload. This synchronization occurs every 15 seconds (the heartbeat) from the LMASs to their CMASs, then out to all other CMASs in the workload.
CICS provides two dynamic routing exits—named in the System Initialization Table (SIT)—with different behavior characteristics:
- Dynamic Transaction Routing requests may be redirected using the DTRPGM System Initialization parameter. For DTRPGM requests, the routing region calls (from CICS) to decide the target region is synchronized with execution of the request at the selected target, which is then followed by a call from CICS upon completion of the dynamic request. This allows the router to increment the task load count before informing CICS of the target region system id, and also to decrement the count on completion of the request.
- Distributed Routing requests may be redirected using the DSRTPGM System Initialization parameter. For DSRTPGM requests, the routing region calls from CICS to decide whether a target is synchronized with the selected target. Typically, these dynamic requests are asynchronous CICS STARTs, so the router has no notification of when the routed transaction begins or ends. CICSPlex SM has accommodated this anomaly by stipulating that DSRTPGM target regions must have workload specifications associated with them; this transforms the targets into logical routing regions and lets the CPSM routing processes determine they’re being called at the DSRTPGM target level. This allows the task load count to be adjusted at transaction commencement and completion.
Given that CICSPlex SM routing regions count dynamic transaction throughput in a CICSplex, transactions started locally on the target regions remain unaccountable by the routing regions until a heartbeat (synchronization) occurs. Actually, the router transaction counts won’t be accurately synchronized until two heartbeats have occurred—the first to increment the count and the second to decrement it again. However, this discrepancy isn’t considered as severe as when different CMASs manage a router and target.
For a multiple CMAS situation, the routing regions will be evaluating status data for a target region as described in its local WLM data space. If that target region is managed by a different CMAS from that owned by the router, then status data describing that target region may be up to 15 seconds old. For DTRPGM requests, this latency doesn’t have a severe impact. However, for DSRTPGM requests, the effect can be quite dramatic, particularly for high levels of workload throughput. The effect is known as workload batching.
Workload batching is the term applied to the effect seen in heavy workloads in multiple CMAS environments, where dynamic distributed (DSRTPGM) routing requests are being processed. A target region may be managed by a different CMAS to the routing region typically because they reside in different LPARs. In that circumstance, the router is using a copy of the descriptor structure to evaluate the target status from the actual structure employed by the target itself.
The copied target descriptor being reviewed is synchronized with the actual descriptor in 15-second intervals. Between these15-second heartbeats, the router will have a less accurate status compared to other potential target regions in the workload and will continue to base its routing decisions on the last known valid data. Eventually, a heartbeat will occur and the data is refreshed. Compared to other regions, the target could now be either extremely busy or completely unexploited. The router reacts to this by appearing to be more aggressive in routing work toward or away from the target. This can cycle the region from a high throughput to a low throughput on this heartbeat boundary. This workload batching state will continue until there’s a genuine lull in the workload throughput, which will settle the batching down until the throughput picks up again.
A user watching the task loading across the CICSplex will see some regions running at their MAXTASK limits and being continually fed with dynamically routed traffic while others remain unused. A snapshot 15 seconds later will probably see a reversal of utilization—the busy regions will be idle, and the idle regions will be at their MAXTASK limit. The users most susceptible to these events are those who use MQ triggers to feed transactional data into their CICSplexes, where the trigger regions tend to be managed by different CMASs. Those users would see the greatest benefit of Sysplex optimized workload routing.
Sysplex Optimized Workloads
When CICSPlex SM was originally conceived, a single data space was considered to be a wide enough scope to provide a common data reference point for all regions in the CICSplex. Today, that’s no longer true. The mechanism chosen to broaden the scope of these common points of reference is the z/OS coupling facility. However, the content of the WLM data space hasn’t simply been migrated into the coupling facility; some internal re-engineering was also undertaken.
Routing regions are currently responsible for adjusting the target region load counts WLM uses to determine task loads. On every heartbeat, the CICSPlex SM agent in the user CICS region reports its task count to its owning CMAS. The CMAS will then update the load count in the target region descriptor of its WLM data space and broadcast that value to other CMASs participating in workloads associated with the user CICS region.
For Sysplex optimized workloads, this is turned around. When a target region runs in optimized mode, the target region is responsible for maintaining the reported task count. CICS does this counting in the transaction manager; the count includes instances of all tasks in the CICS region, not just those that are dynamically routed. This load value for the CICS region, along with its basic health status, is periodically broadcast to the coupling facility where other CICS components can interrogate it.
At the CICSPlex SM level, a router will know whether this region status data will be available or not, and will factor this data into its dynamic routing decision, in preference to its original data space references. This means routing regions are reviewing the same status data for a potential target region, regardless of which CMAS manages it. Therefore, the routing region is always using current status data to evaluate a target region rather than status data that could be up to 15 seconds old. In an environment where all routing targets are in a similar health and connectivity state, this means the spread of work across the workload target scope is far more even than in non-optimized mode. However, all the original data space processing remains intact. This is necessary to maintain a seamless fallback mechanism should the coupling facility become unavailable.
Switching Workload to Optimized State
For a workload to operate in a fully optimized state, all regions in the workload must be at the CICS TS V4.1 level or higher and a CICS region status server must be running in the same z/OS image as each region in the workload in the CICSplex. This is a batch address space running a specialized CICS Coupling Facility Data Table (CFDT) server that is properly configured. This server must be managing the same CFDT pool name as that identified in the CICSplex definition (CPLEXDEF) for the CICSplex that will encompass your workload. The default pool name is DFHRSTAT. You may choose a different pool name or even a pool name that already exists in your z/OS configuration. However, a discrete pool name for dedicated region status exploitation is highly recommended. Otherwise, access to user tables in the pool may be degraded by WLM operation, and vice versa.
The decision on region status pool name should be made before any CICS regions in the workload are started. You may change the pool name while a workload is in flight, but it’s not recommended because:
- The change won’t be effective until all regions in the workload are restarted.
- The pool name switch while the workload runs will cause the optimization function to be de-activated for all CICS regions connected to the region status server.
If the pool name is changed in error while the workload runs, then reversing the name to its original value will allow optimization to be re-activated. CICS regions required to run in optimized mode must be enabled for optimization. Setting a number of regions to be optimized is most easily achieved using the CICSplex SM Web User Interface (WUI) CICS System Definition (CSYSDEF) tabular view and summarizing the list to a single row. Then use the update button to change WLM optimization enablement to enabled. That will enable optimization for all regions. Because you won’t want optimization set for your WUI server regions (and possibly others), you should then run through the updated system definition list and re-disable optimization for your WUI server regions on an individual basis.
If some dynamic routing regions are already running, you may activate optimization to in-flight CICS regions using the “MASs known to CICSplex” tabular view in a similar manner to the “CICS system definitions” view. Users don’t need additional configuration actions to optimize their workloads. If you don’t run a region status server, workloads are forced to remain in a non-optimized state.
Coupling Facility Impact
The coupling facility is impacted in two ways. CICS region status data is broadcast to it by target regions, and that data is subsequently read back in the routing regions when a route decision is made. If CICS were to rebroadcast status data at every change instance, and read it back on every occasion a route decision is made, then the coupling facility impact could be unsustainable. So, caching mechanisms were built in to reduce the number of I/Os to the coupling facility.
Two tuning parameters are provided at the CICSplex and CICS system definition levels to adjust coupling facility exploitation. One controls how often the coupling facility is updated with task throughput data; the other controls how long region status data should be cached by a routing region before requesting a refresh:
- Region status server update frequency: UPDATERS
- Region status server read interval: READRS.
A detailed description of these attributes is available in the field help for the CICSplex definition and CICS System definition WUI views.
In addition to tuning the general read and update impact to the coupling facility, two other specialized parameters allow further fine-tuning of the workload for heavy and light workload throughput:
- Region status server top tier: TOPRSUPD
- Region status server bottom tier: BOTRSUPD.
If you think you need to deviate from the default settings for these attributes, monitor the performance of your coupling facility and that of WLM throughput capabilities for at least several days after modification.
A region status record is 40 bytes. There’s one record for each region in your CICSplex, which is stored in the physical data table named from that CICSplex. This data table will be generated within the named CFDT pool from the CICSplex definition resource table. CICS writes region status data to a file named DFHRSTAT. The definition of DFHRSTAT is automatically generated, and will locate a physical data table named from the parent CICSplex. Therefore, if PLEX1 comprised 100 regions, then the required space in the coupling facility would be 4,000 bytes for a table named PLEX1.
Optimized Workload Benefits
If the topology of a CICSplex is such that regions in a workload can be managed by the same CMAS, then the perceived benefit won’t be so great. If most of the dynamic routing traffic flows through the DTRPGM exit, the benefit won’t be particularly high. If target regions in a workload execute a high proportion of non-dynamic throughput, the benefit of implementing an optimized workload is stronger.
Benefits of running workloads in optimized state should become clear fairly quickly for a workload comprised of routers and targets managed by different CMASs where the bulk of the dynamic traffic flows through the DSRTPGM exit—especially for transactional input that’s generated by dynamic CICS STARTs. No workload batching should occur. An effect of this will be that the overall workload should run through faster because fewer (if any) routed transactions would be waiting in the queue of a CICS region already at its MAXTASK limit.
When your CICSplex extends beyond the scope of your Sysplex, there’s little benefit to optimized workload routing. Typically, this would occur when routers and targets are physically remote from each other. In those situations, the isolated coupling facilities can’t be linked or shared, which effectively nullifies the optimized routing functions.
Determining Workload Optimization State
The easiest way to check the state of workload optimization is to use the active workloads view in the CICSPlex SM WUI. The list view contains a row for each workload active in the CICSplex. A new column added to this view indicates the workload optimization status. Expected values are:
- ACTIVE: All targets and routers are executing in optimized workload state.
- PARTIAL: At least one target and one router are executing in optimized workload mode.
- INACTIVE: The workload isn’t running in optimized state, because either no routing regions in the workload are running in optimized state, no target regions in the workload are running in optimized state, or the workload was designated as being non-optimized.
The easiest way to check the optimization state for a CICS region is to use the routing region or target region views located in the active workloads menu. The optimization status for the region is shown in the list views for both region types. Expected values are:
- ACTIVE: The region is executing in optimized workload state.
- INACTIVE: The region can execute in optimized state, but it’s currently non-optimized. Reasons for this are detailed in the help data for the routing and target region views in the WUI.
- N_A: The region isn’t optimized workload-capable—probably because the region is running a CICS TS version prior to V4.1.
If you have regions that require no optimization capabilities, then set the region status server update frequency value for those regions to 0 to prevent the CICS transaction manager from broadcasting irrelevant region status data to the coupling facility. This would typically include all WUI server regions and any regions assigned a purely routing role.
CICS will record the status of a CICS region to the DFHRSTAT CFDT file. The definition for this file is automatically generated when the CICS region status function is initialized. The CICS file definition will be related to a physical CFDT that’s named after the CICSplex name the region belongs to. When defining this file, RS domain will also generate a poolname gathered from the CICSplex Definition (CPLEXDEF) the starting region belongs to. The default poolname is also DFHRSTAT. In any given z/OS image, there must be one region status server per poolname running in that image.
For example, If a z/OS image executes CICS regions associated with PLEX1, PLEX2 and PLEX3, which all specify the default poolname, then only a single region status server must be running in that image for the CFDT pool named DFHRSTAT.
Any routers needing to examine the status of a remote target will also require a region status server to run in the local z/OS image for the same poolname as that servicing the target regions. If you use the default poolname in the CPLEXDEF of all your CICSplex definitions, you’ll require one region status server per z/OS image.
Figure 1 shows the lifetime of an unoptimized workload of 10,000 started tasks initiated from a single routing region. The workload is dynamically routed across a target scope of 30 target regions. Ten of these regions are managed by the same CMAS as the router, but the other 20 regions are managed by two other CMASs, one on a different LPAR in the Sysplex. Each line in the chart represents the task load in a target region at 10-second intervals. The lines clustered along the bottom of the chart are all local to the router. None of them exceed 10 percent exploitation. All the other regions are remote from the router and are continuously surging 100 percent utilization and then back to idle. This is workload batching.
Consider Figure 2. This is the same workload, but with Sysplex optimization activated. No workload batching is occurring. None of the target regions are idle or at the MAXTASK limit. The workload is being spread equitably. The locality of the target regions to the router is appropriately reflected; the upper band of target regions are local to the router, and the lower band is remote from it.
WLM correctly favors the local target regions over the remote ones until the task load difference for the region locality exceeds approximately 30 percent. However, the most important difference between the optimized and unoptimized workloads is represented by the number of 10-second time intervals across the bottom of each graph. The duration of the unoptimized workload was 16 10-second periods. When the same workload runs in optimized state, the workload completes in 12 periods. In this test case, that was a 25 percent savings in workload throughput time. These figures were measured in ideal circumstances; you’ll need to run your own tests to determine your precise benefits.
During testing of other intensive distributed workloads, time savings of more than 50 percent were recorded. The higher the task load throughput, the greater the savings in throughput time. Sysplex optimization appears to be most effective at times of high throughput demand for distributed workloads. These are workloads fed to CICS through asynchronous START commands. Typically, these are from MQSeries trigger transactions or WebSphere Sysplex Distributor.
Workloads that originate from synchronous dynamic routing requests—such as those from transaction routes, function ships, etc.—won’t show such an exceptional improvement unless those target regions share transaction traffic with locally initiated tasks. In those circumstances, Sysplex optimization means the router will become aware of the non-dynamic throughput to a target region long before a heartbeat occurs; again, this lets routers make more intelligent routing decisions.
If you’re running at least CICSplex to CICS TS V4.1 and your dynamic workload throughput comprises a high percentage of asynchronous routing requests, you should consider implementing Sysplex optimization.
The key points to remember are:
- To enable workload optimization, first define and execute a region status server in each MVS image that will execute CICS regions intending to exploit it. When all regions are migrated to CICS TS V4.1, those requiring optimization must be enabled in their CICS system definitions (CSYSDEFs).
- Users may mix and execute CICS TS V4.1 and pre-V4.1 regions in workload, but full optimization benefits won’t occur until all systems are running CICS TS V4.1.
- One region status server is required per pool name per z/OS image. Don’t start any servers if you don’t want to exploit optimized workloads.
- Don’t adjust the WLM RS domain tuning parameters until you’re certain an adjustment is required. When changes are deemed necessary, make them in gradual increments.
- Look at the new active workload views to monitor status and progress of workloads in target regions.