Apr 22 ’08

Service Provisioning: Keeping the Requester Happy

by Editor in z/Journal

Service provisioning, in one context, means the “preparation beforehand” of IT systems’ “assets, materials, resources, or supplies” required to carry out some defined activity. But paraphrasing from the Oasis Service Provisioning Markup Language V1 Specification, it can go further than the initial juncture of providing digital resources, to the onward management lifecycle of these resources as managed items. This definition will, in the future, extend to include all corporate assets, such as meeting rooms and other nondigital resources.

A Service-Level Agreement (SLA) is a negotiated agreement designed to create a common understanding about services, priorities, and responsibilities. SLAs are a means, in Service- Oriented Architectures (SOAs), to reserve service capacity at a defined service quality level. Provisioning systems enable automatic configuration of digital resources such as servers, storage, and routers based on a configuration specification. SLAs are used for reserving the capacity of Software as a Service (SaaS), and for digital resource level assets or resources in a complex system such as storage capacity, network bandwidth, computing nodes, and memory.

SLAs all contain similar sets of elements: service elements and management elements. These elements provide an objective basis for gauging service effectiveness. It ensures that all interested parties use the same criteria to evaluate service quality. It’s a living document. The signing parties to the SLA regularly review the agreement to assess service adequacy and negotiate adjustments. The service user may need to double its volume of digital widgets use and impose new security layers. The service provider can address the number of widgets required in terms of processors, storage, bandwidth, and security overhead via the new SLA.

The service elements define services by documenting such things as:

• The services to be provided

• Conditions of service availability

• Service standards (such as the timeframes that services will be provided)

• The responsibilities of both parties

• Cost vs. service trade-offs and possible effects on peak hour demand (latency)

• Escalation procedures.

The management elements deal with such things as:

• How service effectiveness will be tracked (real-time, summarized end of day data, or some other mechanism)

• How service effectiveness will be reported and addressed

• How the service level indicated by the automatic provisioning template will be provided

• How the parties will review and revise the agreement.

There’s one final element that probably should be in every SLA. The penalty element defines the real, enforceable penalties that the involved parties agree to impose if either party violates the agreed-to service element terms of the SLA. If the user signs up for 2,000 digital widgets a day, and then consistently uses 2,500, the service provider should ensure that any latency addressing provisioning costs required to achieve the agreed upon Quality of Service (QoS) be covered with a reasonable profit. The same is true if the provider fails to achieve the QoS against the 2,000 widgets a day.

Using the SLA elements, a provisioning manager can be populated with QoS targets and identified with required digital resources. This data can be provided as templates. (For more on this, see “Template-Based Automated Service Provisioning—Supporting the Agreement-Driven Service Life-Cycle” at www.iw.uni-karlsruhe.de/Publications/ Ludwig_et_al._05_-_Template-based_ Service_Provisioning.pdf.) These service templates, using the WS-Agreement standard, can then be used as agreements with any requester.

WS-Agreement defines a template format that contains a partially completed agreement. The partially completed agreement has a definition of named locations where an agreement initiator of a new requester can provide agreement content and rules that limit what can be filled in.

As an example, a field could be the value for the QoS response time of an operation and a constraint could limit the choice to one, two, five, or 10 or more seconds. Another field could be the levels of security required by this requester. Based on the constraint selected, the provisioning manager will deploy the resources identified for that specific constraint situation. Those resources would include the bandwidth, processors, storage, and security layering required for the selected constraints.

This technique examines the assignment of digital resources and assets to address the initial provisioning needs of a service, but what about the issue of latency for the service?

In our previous example, the service requester signs up for 2,000 digital widgets a day. Do we simply assume the service usage rate will be linear throughout the day? And what’s the definition for the duration of a “day”? Twenty-four hours? Eight hours?

Service use often doesn’t occur in a linear fashion. There are peaks and valleys in usage rates. In the ’70s, when transaction systems became popular, it was apparent that people tended to work harder in the afternoon and some business services were more heavily used at specific times of the day just by the nature of the service. This became known as the peak hour demand (or demand latency). Some services might have multiple peak hours.

As peak hours occur, other ramifications develop. Since requester response time is elongating, the number of requests the requester really needs to make isn’t being addressed. So these new requests simply stack up, waiting their turn. This is similar to what happens on our roads at rush hour. A traffic light cycle time normally lets enough traffic through so the roads are only a little, if at all, congested. At rush hour, the same light cycle time is insufficient to allow the number of cars through the intersection to maintain a smooth flow of traffic. The cars back up and go slower. So is the peak hour of any digital service.

Using a 24-hour day, when there’s a peak hour demand, the digital asset/ resource requirements for the peak hour often double or triple the resources required for the other 23 hours. Figure 1 shows an example of peak hour representations of one service’s required resources.

If any of the resources are too con- strained to allow the utilizations represented in Figure 1, the requester perceives an elongation of response times plus a backup in user demand awaiting the service. Figure 2 shows a storage-constrained scenario.

There are two ways to deal with this situation. Each has pluses and minuses. Let’s take a look at each approach.

Dynamic Provisioning

If there’s a workload manager controlling the service’s response curve and resources, it may dynamically add pool resources to address the storage constraint. It should simultaneously add extra server and bandwidth because, as we address the initial constraint, the demand may well flow to another resource constraint.

The downside of dynamic addition of resources from a pool is that other services also experiencing peak hour issues may be attempting to obtain pooled resources. This will lead to resource pool exhaustion and unhappy services users, as the services can’t meet their SLAs.

Let’s consider the mainframe as it might play in this arena. If you’re running z/OS on a z9, you can use Intelligent Resource Director (IRD) to cause a “donation” of CP cycles from clustered Logical Partitions (LPARs). You can accomplish this by having PR/SM and Workload Manager (WLM) converse when Suffering Service Class Periods (SSCP) occur. The SSCP occurs when CPU delays are detected and WLM can’t resolve the delay by adjusting dispatch priorities in an LPAR.

The IRD also provides Dynamic Channel-path Management (DCM), which moves channel bandwidth where needed. In simplest terms, the goal of the IRD’s DCM is to equally distribute I/O activity across all DCM-associated channel paths attached to the Central Processor Complex.

The IRD will use a third feature, Channel Subsystem Priority Queuing, designed so the important work that really needs it receives the additional I/O resource, not necessarily the other work that happens to be running in the same LPAR cluster.

Another I/O feature providing parallelism and bandwidth is DYNAMIC Parallel Access Volume (PAV), which supports more than one I/O to a single device at a time. Multiple I/O control blocks are assigned to a single physical disk drive. The primary and alias control blocks allow multiple I/O operations to be started and be in execution to the single physical arm of the disk drive. This reduces and sometimes eliminates queuing. This is similar to aspects of Small Computer System Interface (SCSI) in smaller systems.

CICS Version 3.2 provides enhancements and support levels for many Web services-related protocols and standards. This lets CICS participate in a company’s Web business functions using the full gamut of the most current exploitation, security, and provisioning capabilities.

Static Over-Provisioning

The other approach to peak hour provisioning is to over-provision the required resources at first initiation of the service. While this is the simplest technique, it’s also the most expensive and will require human monitoring over time.

The over-provisioning approach will cause the entity paying for the service to buy resources that are needed only for the peak demand and pay for them even when demand is trivial. The service provider also will have to monitor this scenario as more parties begin to use this service to see if anyone else is interfering with and grabbing the over-provisioned resources because of their own peak demands.

A second concept would be to overprovision at a specific time of day in anticipation of the peak demand period. This can be done via peak demand templates that define the average asset and resource demands that may occur. These templates can be invoked by time-of-day triggers. But again, the service requester is now paying for resources that may or may not be required on a specific day. The difference here is that they’ve agreed to it as part of the SLA that was used to define the template.

Conclusions

The mainframe is the oldest, wisest architecture in its use of parallel multiprocessing workloads. With the bells and whistles in current products of the System z hardware, z/OS, and strategic business function delivery platforms such as CICS, the mainframe can ensure proper provision levels of any Web service it must provide to a business solution.

The use of dynamic provisioning concepts such as those used by System z IRD and CICS management of resources for legacy and Web services using its proven architectures should appeal to both service providers and service requesters as economically viable methods of controlling and addressing provision and provisioning resources.

Let’s stop thinking of the mainframe in terms of “legacy” applications and start thinking of it in terms of the Web services it can provide at the cheapest possible level of resource pools for provisioning peaks and valleys of demand. Z