IT Management

Aug 2 ’10

The Workload Manager (WLM) feature of CICSPlex System Manager is a useful tool for optimizing system capacity in highly complex environments. This tool analyzes the load capacity and health state of CICS regions intended to be targets of dynamic transaction routing requests and selects the region it considers the most appropriate target. CICS Transaction Server for z/OS Version 4.1 introduces a new feature of CICSPlex SM named Sysplex Optimized Workload Routing. This subfunction of the existing WLM feature was implemented in response to concerns voiced by many large enterprise customers regarding the observed behavior of WLM in CICSplexes that span multiple Logical Partitions (LPARs).

Existing WLM Decision Behavior

Let’s consider the current WLM decision behavior. WLM employs data spaces owned by a CICS Managing Address Space (CMAS) to share cross-region load and status data. Every CMAS owns a single WLM data space it shares with all user CICS regions it directly manages. A user region managed by a CMAS is known to CICSPlex SM as Local Managed Address Space, or LMAS. During CMAS initialization, that area is verified and formatted with the structures necessary to describe all workload activity related to the CMAS. When the user CICS regions begin routing dynamic traffic, the state of those CICS regions is recorded in this data space.

In a CICSplex where the same CMAS manages all dynamic routing CICS regions, all those regions use the same WLM data space to determine workload information required for WLM operation. That means dynamic routing decisions are made based on the most current load data for a potential routing target region. A routing decision is based on an amalgamation of factors:

  • How busy is the region?
  • How healthy is the region?
  • How fast is the link between the router and target?
  • Are there outstanding CICSPlex SM Realtime Analysis (RTA) events associated with the workload?
  • Are there transaction affinities outstanding to override the dynamic routing decision?

This processing rationale provides equitable dynamic routing decisions when working in a single CMAS environment. However, with workloads being spread across multiple z/OS images, users must configure additional CMASs to manage the user CICS regions on the disparate LPARs. Each WLM data space must maintain a complete set of structures to describe every CICS region in the workload—not just the CICS regions that each CMAS is responsible for, but also those regions in other LPARs managed by other CMASs.

This means the WLM data space each CMAS owns must be synchronized periodically with the WLM data spaces owned by other CMASs participating in the same workload. This synchronization occurs every 15 seconds (the heartbeat) from the LMASs to their CMASs, then out to all other CMASs in the workload.

CICS provides two dynamic routing exits—named in the System Initialization Table (SIT)—with different behavior characteristics:

  • Dynamic Transaction Routing requests may be redirected using the DTRPGM System Initialization parameter. For DTRPGM requests, the routing region calls (from CICS) to decide the target region is synchronized with execution of the request at the selected target, which is then followed by a call from CICS upon completion of the dynamic request. This allows the router to increment the task load count before informing CICS of the target region system id, and also to decrement the count on completion of the request.
  • Distributed Routing requests may be redirected using the DSRTPGM System Initialization parameter. For DSRTPGM requests, the routing region calls from CICS to decide whether a target is synchronized with the selected target. Typically, these dynamic requests are asynchronous CICS STARTs, so the router has no notification of when the routed transaction begins or ends. CICSPlex SM has accommodated this anomaly by stipulating that DSRTPGM target regions must have workload specifications associated with them; this transforms the targets into logical routing regions and lets the CPSM routing processes determine they’re being called at the DSRTPGM target level. This allows the task load count to be adjusted at transaction commencement and completion.

Given that CICSPlex SM routing regions count dynamic transaction throughput in a CICSplex, transactions started locally on the target regions remain unaccountable by the routing regions until a heartbeat (synchronization) occurs. Actually, the router transaction counts won’t be accurately synchronized until two heartbeats have occurred—the first to increment the count and the second to decrement it again. However, this discrepancy isn’t considered as severe as when different CMASs manage a router and target.

For a multiple CMAS situation, the routing regions will be evaluating status data for a target region as described in its local WLM data space. If that target region is managed by a different CMAS from that owned by the router, then status data describing that target region may be up to 15 seconds old. For DTRPGM requests, this latency doesn’t have a severe impact. However, for DSRTPGM requests, the effect can be quite dramatic, particularly for high levels of workload throughput. The effect is known as workload batching.

Workload Batching

5 Pages