Nov 3 ’14
Making the Case for True End-to-End Host Managed QoS
In the February 2008 issue of zJournal, I wrote an article titled “FICON and Quality of Service (QoS): Making the Case for True, End-to-End, Host-Managed QoS.” That article laid the groundwork for a great deal of discussion in the industry, and with IBM. Given some of the things that we will see in the near future (much sooner than six years), I thought it would be a good idea to revisit the original article, and provide some updates on things that have happened since that article was published six years ago. A subsequent article in Enterprise Tech Journal will examine the technical details of any new functionality at the appropriate time.
Here’s the original article.
Defining Quality and Service
Quality and service have different meanings in different contexts. A good, general definition of service is: the expected behavior or outcome from a system. So, QoS is the degree to which the expected outcome is realized. Quantifying and measuring QoS also becomes a context-dependent task; it means different things to different people. For example, to the casual Internet user browsing a news site, QoS may simply mean the responsiveness of the Web server to his/her page accesses. On the other hand, to a systems administrator, QoS may mean the throughput and availability of the Web server, the network connection, storage subsystem or some combination of these. To achieve a desired level of service, all the components on the end-to-end path must be able to deliver that level of service.
Huseyin Simitci, in his book, Storage Network Performance Analysis (Wiley, 2003), defined some additional key concepts concerning QoS in storage network architectures:
• QoS architecture. The system must include the structures and interfaces to request, configure and measure QoS. If the system’s peak performance is below the desired level, no amount of management can provide QoS.
• Admissions policy. This is a critical aspect of a QoS system. When a system accepts to serve (admit) a request, it must ensure certain resources are available to achieve the requested QoS level. If there aren’t enough resources, or if using the existing resources will hamper the QoS guarantees of previously admitted requests, the new arrivals should be rejected.
• Resource reservation. After a request is admitted to the system, sufficient system resources must be reserved to provide QoS to that request.
• Class of Service (CoS). Even though CoS is sometimes used interchangeably (and incorrectly) with QoS, technically it has a different meaning. CoS defines the type of service and doesn’t indicate how well the service is performed. Simitci uses the example of a Fibre Channel (FC) CoS defining message delivery guarantees—far different from any QoS guarantees of throughput, response time, etc.
In their proceedings paper for the 2001 ACM Conference on E-Commerce, Menascé, Barbará and Dodge developed an equation to compute the ratio of the QoS deficiency to the desired level:
QoS deviation = (achieved QoS-desired QoS) / desired QoS.
In this equation, if the desired QoS level is greater than the achieved QoS level, you have a negative ratio. Likewise, a positive deviation denotes a QoS better than the one desired.
Storage and QoS
Different data and different applications have varying performance needs, so trade-offs in performance are allowed. Some applications will have high QoS priorities and others can be delayed to make way for higher priority jobs. If all applications and data required the absolute best performance all the time, QoS wouldn’t be achievable!
QoS in storage and storage networks is simply an optimization problem. You achieve optimization by trading one performance metric for another. A classic example of such a trade-off is between throughput and response time. Increasing queue length (number of active jobs) increases response time and throughput. If what an application requires is low, bounded response times, you must set it to accept low, bounded throughput values.
Generally, storage subsystems can’t make QoS guarantees. They’re constructed to accept and queue all arriving I/O commands. Certain performance tuning techniques, such as queue prioritization, assigning more buffers/cache space and/or DASD spindles to jobs with higher priority, can help you indirectly achieve partial QoS. But a partial QoS optimization is merely a best effort service level that doesn’t give any explicit QoS guarantees.
While IP networks and Infiniband have extensive QoS mechanisms and standards, FC fabrics don’t yet have the same level of detail for QoS mechanisms. The Storage Networking Industry Association (SNIA) includes a formal definition for QoS in its SNIA Dictionary of Storage Networking Terminology:
“QoS is a technique for managing computer system resources such as bandwidth by specifying user visible parameters such as message delivery time. Policy rules are used to describe the operation of network elements to make these guarantees. Relevant standards for QoS in the IETF (Internet Engineering Task Force) are the RSVP (Resource Reservation Protocol) and COPS (Common Open Policy Service) protocol. RSVP allows for the reservation of bandwidth in advance, while COPS allows routers and switches to obtain policy rules from a server.”
In a nutshell, in a storage network, QoS is a set of metrics that predict the behavior, reliability, speed and latency of a path.
Many existing claims for QoS mostly perform monitoring and configuration for best-effort performance. Or, they’re not true end-to-end implementations of QoS. The next section describes some standards work started but not completed in the FC standards groups. Following that, we’ll discuss the Infiniband mechanism for QoS.
FC Class 4 CoS
Some initial QoS efforts were made in the T11 standards group to develop a QoS standard for FC. It essentially was written as a CoS and was highly complex. A team of consultants worked with the major switch vendors to develop a series of proposals that impacted several different standards. A summary of Class 4 follows. It was never formally adopted or implemented. The discussion of Class 4 is included to reinforce the point that QoS is a complex topic, not just a marketing buzzword.
An FC CoS can be defined as a frame delivery scheme exhibiting a specified set of delivery characteristics and attributes. ESCON and FICON are both part of the FC standard and CoS specifications.
• Class 1: A CoS providing a dedicated connection between two ports with confirmed delivery or notification of non-delivery
• Class 2: A CoS providing a frame switching service between two ports with confirmed delivery or notification of non-deliverability
• Class 3: A CoS providing a frame switching datagram service between two ports or a multicast service between a multicast originator and one or more multicast recipients
• Class 4: A CoS providing a fractional bandwidth virtual circuit between two ports with confirmed delivery or notification of non-deliverability.
Class 4 is often called a virtual circuit CoS. It works to provide better QoS guarantees for bandwidth and latency than Class 2 or Class 3, while providing more flexibility than Class 1. Similar to Class 1, it’s a type of dedicated connection service. Class 4 is a connection-oriented CoS with confirmation of delivery (acknowledgement) or notification that a frame couldn’t be processed (reject). Class 4 provides for allocating a fraction of the bandwidth on a path between two node ports and guarantees latency in negotiated QoS bounds. It provides a virtual circuit between a pair of node ports with guaranteed bandwidth and latency in addition to the confirmation of delivery or notification of non-deliverability of frames. For the duration of the Class 4 virtual circuit, all resources necessary to provide that bandwidth are reserved for that virtual circuit.
Unlike Class 1, which reserves the entire bandwidth of the path, Class 4 supports the allocation of a requested amount of bandwidth. The bandwidth in each direction is divided among up to 254 virtual circuit connections to other N_Ports on the fabric. When the virtual circuit(s) is established, resources are reserved for the subsequent delivery of Class 4 frames. Like Class 1, Class 4 provides in-order delivery of frames. A Class 4 circuit includes at least one virtual circuit in each direction with a set of QoS parameters for each virtual circuit. These QoS parameters include guaranteed transmission and reception bandwidths and/or guaranteed maximum latencies in each direction across the fabric. When the request is made to establish the virtual circuit, the request specifies the bandwidth requested and amount of latency or frame jitter acceptable.
The Quality of Service Facilitator (QoSF), a server in the fabric, manages bandwidth and latency guarantees for Class 4 virtual circuits. The QoSF is at the well-known address x’FF FFF9’ and is used to negotiate, manage and maintain the QoS for each virtual circuit and assure consistency among all the virtual circuits set up across the full fabric to all ports. The QoSF is an optional service defined by the Fibre Channel Standards to specifically support Class 4 service. Because the QoSF manages bandwidth through the fabric, it must be provided by a Class 4 capable switch/director.
At the time the virtual circuit is established, the route is chosen and a circuit created. All frames associated with the Class 4 virtual circuit will be routed via that circuit, ensuring in-order frame delivery in a Class 4 virtual circuit. In addition, because the route is fixed for the duration of the circuit, the delivery latency is deterministic. With Class 4, the virtual circuits can be in a dormant state with the virtual circuit set up at the N_Ports and through the fabric, but with no data flowing, or a live state where data is actively flowing. To set up a Class 4 virtual circuit, the Circuit Initiator (CTI) sends a Quality of Service Request (QoSR) extended link service command to the QoSF. The QoSF ensures the fabric has the available transmission resources to satisfy the requested QoS parameters and then forwards the request to the Circuit Recipient (CTR). If both the fabric and recipient can provide the requested QoS, the QoS request is accepted and the transmission can start in both directions. If the requested QoS parameters can’t be met, the request is rejected.
In Class 4, the fabric manages the flow of frames between node ports and the fabric by using the virtual circuit flow control mechanism. This is a buffer-to-buffer flow control mechanism similar to the R_RDY Fibre Channel flow control mechanism. Virtual-circuit flow control uses the Virtual Circuit Ready (VC_RDY) ordered set. VC_RDY resembles R_RDY, but it contains a virtual circuit identifier byte in the primitive signal, indicating which VC is being given the buffer-to- buffer credit. Managing the flow of frames on Inter-Switch Links (ISLs) also must support the virtual circuit flow control to manage the flow of Class 4 frames between switches.
Each VC-RDY indicates to the N _ Port that a single Class 4 frame is needed from the N_ Port if it wishes to maintain the requested bandwidth. Each VC_RDY also identifies which virtual circuit is given credit to send another frame. The fabric controls the bandwidth available to each virtual circuit by the frequency of VC_RDY transmission for that circuit. One VC_RDY per second is permission to send one frame per second (2 kilobytes/second if 2k frame payloads are being used). One thousand VC_RDYs per second is permission to send 1,000 frames per second (2MB per second if 2k frame payloads are being used). The fabric is expected to make any unused bandwidth available for other live Class 4 circuits, and for Class 2 or Class 3 frames, so the VC_RDY does allow other frames to be sent from the N_Port.
There are some potential scalability difficulties associated with Class 4 service, since the fabric must negotiate resource allocation across each of the 254 possible VCs on each N _Port. Also, fabric busy (F_BSY) isn’t allowed in Class 4. Resources for delivery of Class 4 frames are reserved when the virtual circuit is established, so the fabric must be able to deliver the frames.
Class 4 is a complex issue. More detailed information is available in Kembel’s The Fibre Channel Consultant series of textbooks. Because of its complexity, Class 4 was never fully adopted as a standard. Further work on it was stopped, and much of the language has been removed from the FC standard. For that reason, other mechanisms and models for QoS in FICON were examined. One of these was the method used by Infiniband.
Infiniband and QoS
Infiniband addresses QoS through the concept of VLs. Infiniband’s VLs enable different QoS guarantees across the fabric (e.g., priority, latency guarantees, bandwidth guarantees, etc.) by logically dividing a physical link into multiple virtual links. Each VL has its own independent resource (i.e., send and receive buffers) dedicated to traffic with specific service levels.
Infiniband’s VLs are based on independent datastreams for each VL level. Each port can support up to 16 VLs numbered 0 to 15. VL15 is reserved exclusively for subnet management and is the management VL. The others (VL0- VL14) are data VLs. Each port must support the management VL and at least one data VL starting with VL0. Flow control is on a per VL basis. One VL not having an input buffer available doesn’t prevent data from following on the other VLs.
Infiniband VLs enable the fabric to support different QoS over the same physical links, depending on how the subnet manager takes advantage of them. Not all ports have to support the same number of VLs for management to take advantage of it. The subnet manager assigns service levels to end nodes and configures each port with its own service level-to-VL mapping. For instance, the subnet manager can assign service levels based on priority, bandwidth negotiation, etc. and the end node uses that value. As the packet traverses the fabric, each port determines which VL the packet uses based on the service level in the packet and the port’s service level-to-VL mapping table.
Another possible use for VLs is for separation of traffic and fairness when multiple systems share the same subnet. In this case, the subnet manager uses a different set of SLs for each system and each set of SLs maps to different VLs at each port. So heavy traffic on one VL doesn’t impact the other systems.
FICON director vendors have been actively pursuing QoS for both the open systems and FICON environment. Due to the complexity, and lack of progress made by standards bodies on Class 4 CoS, these vendors haven’t implemented Class 4 service on their switches and directors. The director vendors have implemented other QoS type features, including virtual channels, ingress rate limiting, software prioritization schema and SID/DID prioritization. Most of these don’t provide true, end-to-end, QoS for FICON environments. Virtual channel technology is closely based on Infiniband VLs and deserves a brief look.
Virtual channel technology represented an important breakthrough in the design of large storage networks. The technology is similar to the Infiniband VL concept. To ensure reliable ISL communications, virtual channel technology logically partitions bandwidth in each ISL into many different virtual channels and prioritizes traffic to optimize performance and prevent Head of Line Blocking (HoLB). The FICON director’s operating system automatically manages virtual channel configurations, eliminating the need to manually fine-tune for maximum performance. This technology also works in conjunction with ISL trunking to:
• Improve the efficiency of switch-to-switch communications
• Simplify FICON storage network design
• Reduce the Total Cost of Ownership (TCO).
With virtual channels, all class F traffic for the entire fabric automatically receives its own queue and the highest priority. This ensures the important control frames, such as name server updates, zoning distribution, Registered State Change Notifications (RSCNs), etc. are never waiting behind “normal” payload traffic (also referred to as Class 2 or Class 3 traffic). For Class 2 or 3 traffic (host and storage devices), individual Security ID (SID) and Destination ID (DID) pairs are automatically assigned in a round-robin fashion based on DID across the four data lanes. This prevents HoLB throughout the fabric and since each virtual channel has its own credit mechanism and flow control, slower devices won’t “starve” faster.
While these current technologies are attractive for managing QoS in a small segment of a FICON configuration (between cascaded FICON directors, for example), none offer what customers are really looking for: a mechanism to manage QoS in their FICON environment from host to storage control unit. But there is a way. Sometimes, one must look to the past for inspiration.
WLM and IRD
For years, the IBM mainframe architecture has allowed a mainframe to be divided into separate Logical Partitions (LPARs) so different types of work can run in their own unique environment. Inside a partition, WLM prioritizes all the work depending on its importance. LPARs are assigned LPAR weights, which is the percentage of overall processing power that’s assigned to all the work in that partition. If a workload shifts so more processing power is needed in a particular partition, LPARs shift processing power to the partition that needs it as long as CPU cycles are available. If all the partitions were at peak utilization, the operator had to manually change the LPAR weights. If the demand was unpredictable and irregular (as in a Web server environment), and the system was highly utilized, the operator had to monitor the system at all times, day and night, to ensure high-priority workloads received the resources they needed.
In addition, the connection between channel path and I/O control units is statically defined. In the event of a significant shift in workload, those channel path assignments had to be changed by an operator. Once an I/O request made it to the channel subsystem, it was serviced on a first-in-first-out basis. This could cause your highest priority work to be delayed due to significant I/O contention from lower priority work.
The IRD is composed of three parts. Two of the three are in place for QoS functionality in a channel environment. However, the features that are enabled for ESCON currently don’t “function” in FICON environments. The interleaving capabilities of FICON, coupled with its bandwidth, led to the belief that QoS functionality wasn’t needed or desired in FICON environments. That attitude is changing.
Dynamic Channel Path Management (DCM)
Dynamic Channel Path Management (DCM) is the first channel QoS functionality. DCM lets z/OS dynamically change channel path definitions to ESCON director-attached DASD control units in response to changing workloads. It does this by moving channel resources to the control units where they’re required.
The I/O configuration definition process is complex and requires significant skill. During system initialization, DCM builds tables that represent the physical I/O topology. These tables include entries for each director, channel and DASD control unit that’s accessible (physically attached). These topology tables are then used by DCM to determine what potential paths exist that DCM could add to a control unit to help achieve its bandwidth requirements. The process involves determining how many channels are required by a control unit, and how many other control units, if any, can share that set of channels. For availability, even if only a single channel is ever required by a control unit, two or more are normally defined to it in case of a failure somewhere along the path. Even when the configuration seems perfect, workload changes can produce a situation where an I/O configuration that allowed meeting a response time goal last week is inadequate this week. There may be sufficient I/O resources; they just aren’t where they’re needed.
DCM is designed to let WLM dynamically move channel paths through the ESCON director from one I/O control unit to another in response to changes in the workload requirements. When used in combination with WLM running in goal mode, DCM moves the channel resources to control units used by business-critical workloads to ensure they meet their goals. By defining a number of channel paths as “managed,” they become eligible for this dynamic assignment.
Moving bandwidth to the important workloads uses DASD I/O resources much more efficiently. This may help reduce the number of channel paths needed and could improve availability. In the event of a hardware failure, another channel can be dynamically moved over to handle the work requirements. If the nature of the workload is such that most subsystems have their peak channel requirements at the same time, DCM will be of little help, since its job is to reassign existing channels. DCM works best when there are variations over time between the channel requirements of different DASD subsystems.
Use of DCM in combination with control unit priority queuing type functionality, channel subsystem priority queuing and Parallel Access Volumes (PAVs) lets z/OS function in a more self-tuning and self-defining manner, enhancing end-to-end QoS functionality.
Channel Subsystem Priority Queuing
Prioritizing I/O requests isn’t a new feature for z/OS. I/O requests could be prioritized on device queues in the operating system way back in MVS. Channel Subsystem Priority Queuing (CSSPQ) is an extension of I/O priority queuing, a concept that has evolved from MVS and into OS/390 and z/OS over the past several years. Since the introduction of the Enterprise Storage Server (ESS), WLM has been able to set priorities on I/O requests, which are then honored by the control unit. CSSPQ extended the ability to prioritize I/O requests by addressing one more place where queues could form: the channel subsystem.
In an LPAR cluster, if important work is missing its goals due to I/O contention on channels shared with other work, it will be given a higher channel subsystem I/O priority than the less important work. This function works together with DCM. As additional channels are moved to the partition running the important work, channel subsystem priority queuing is designed so the important work that really needs it receives the additional I/O resource.
WLM can set priorities on I/O requests. These priorities are then used by the host to schedule the work to channel subsystems resources. This lets the user identify their most mission-critical workloads, and lets z/OS work with a CPU to allow this critical work to have greater access to channel subsystem resources.
We’ve reviewed the basic concepts behind QoS and discussed some of the ways QoS is currently being addressed in FICON storage networks. While these mechanisms are sound, they address QoS in only one small segment of the configuration—typically between cascaded FICON directors. What’s needed is a QoS mechanism that enables end-to-end QoS functionality from host to storage control unit. A follow-up article will take a detailed look at DCM and CSSPQ and how they could be adapted for FICON.
Fast Forward to 2014
Fast forward now to September 2014. Let’s briefly look at what has changed and brought us to where we want to be. First, and most important, support for FICON DCM was announced by IBM in fall 2010. All of the functionality described in the original 2008 article that existed for ESCON DCM is now supported for FICON, with the caveat that the FICON switching devices in the configuration have FICON Control Unit Port (CUP) on the directors.
Second, additional CUP commands/programming has been added to the FICON Director Programming Interface. This added CUP functionality gives the zEnterprise added insight into the performance characteristics of the FICON SAN.
Third, improved functionality with FICON director buffer credit configuration has allowed for more consistent performance on FICON interswitch links (ISLs).
Fourth, virtual channel technology on FICON SAN switching devices has improved significantly over the past six years.
Finally, Class-Specific Control (CS-CTL)-based frame prioritization as a QoS option in a SAN is now a reality. CS_CTL-based frame prioritization allows you to prioritize the frames between a host and a target as having high, medium or low priority, depending on the value of the CS_CTL field in the FC frame header. The CS_CTL field in the FC header can be used to assign a priority to a frame. This field can be populated by selected end devices (storage or host) and then honored by the switch, which assigns the frame, based on the value in the CS_CTL field, to allocate appropriate resources throughout the fabric. This method of establishing QoS is an alternative to the switch-controlled assignment that uses zone-based QoS. In other words, host-controlled QoS.
We’ve come a long, long way since 2008. It’s taken awhile to get here. My good friend who helped me write the original article, Dennis Ng, retired from IBM earlier this year. Dennis and I presented this at SHARE in 2008 and 2009 as well. A future article in Enterprise Tech Journal will discuss the technical aspects of the new developments outlined above in more depth.