Aug 2 ’10

Storage Performance Management: More Balance to Improve Throughput

by Editor in z/Journal

It’s important to pay attention to your storage configuration. Careful planning in evenly distributing data and workloads will yield better response times, more resilient operations, and higher throughput from your storage hardware. This article reveals the hidden influence of balance, which resources it impacts, and what balancing techniques you can use to improve performance and throughput without upgrading your hardware. We’ll discuss how storage tuning is uniquely different from processor tuning, and we’ll show that significant throughput and response time improvements may be possible with just a few well-chosen optimizations.

Having an unbalanced storage system can affect performance and cost. When hardware resources aren’t evenly loaded during peak periods, delays will occur even though the resources are more than sufficient to handle the workloads. The consequence could be that hardware is being replaced or upgraded unnecessarily, which is obviously a tremendous waste of financial and other resources. Unfortunately, this often happens because of the low visibility of the most important metrics for the internal storage system components. If you only look at the z/OS side of I/O, these imbalances can be hard to find, resolve, and prevent.

The mainframe performance perspective has always been that Workload Manager (WLM) optimizes the throughput in the z/OS environment by prioritizing work and assigning resources. This load balancing works well for identical processors in a complex. However, for storage, it’s a different story. The kind of optimization WLM performs simply isn’t possible for I/O since the location of the data is fixed. WLM can only manage the components that are shared, such as the channels and Parallel Access Volume (PAV) aliases. The internal disk storage system resources are mostly out of WLM’s control, and utilization levels of the internal components of the storage system hardware are unknown to z/OS and WLM, so work can’t be directed to optimize balance.

Let’s review how the level of balance on the major internal components of a disk storage controller influences the performance and throughput and how to create the necessary visibility to detect imbalances.

Front-End

In a z/OS environment, front-end balance relates to the FICON channels and adapter cards. Most installations maintain a good balance between the FICON channels. z/OS will nicely balance the load between the channels in one path group and, with multiple path groups, most installations have ways to ensure each path group does about the same amount of work.

The less visible components here are the host adapter boards. Multiple FICON ports are attached to one host adapter board, and the host adapter boards share logic, processor, and bandwidth resources between ports. So, it’s important to carefully design the layout of the port-to-host adapter board configuration. Link statistics provide a good way to track imbalance. The load on each of the FICON channels is the same, but the links aren’t evenly distributed over the host adapter cards. The resulting differences in load on the host adapter cards negatively influence the response times for the links on the busiest cards (see Figure 1).

 

RAID Parity Groups

Redundant Array of Inexpensive Disks (RAID) parity groups contain the actual data the applications want to access. The throughput of a storage system largely depends on the throughput of the RAID parity groups. A common misconception is that a disk storage system with a large amount of cache hardly uses its disks because it does most of its I/O operations from cache or to cache. Although it’s true that under normal circumstances virtually all operations occur via cache, many of those operations do cause disk activity in the background. The only operations that don’t cause a disk access are the random read hits; all others do access the disks at some point. For instance, sequential reads, even though they’re mostly hits, must always be read from disk. As for writes, all writes are done to cache, but they need to be written to disk sooner or later, too. Moreover, for many of the current RAID schemes, a single write on the front-end causes more than one disk I/O on the back-end. For RAID 1 or RAID 10, a write takes two disk operations since all data is mirrored. For RAID 5, a random write takes four operations; for RAID 6, it even takes six operations because of the more complicated way parity updates work for these RAID schemes. Sequential writes are much more efficient on RAID 5 and RAID 6 than random writes, but they will still generate more than one back-end I/O per front-end I/O.

The key observation is that the back-end I/O rate is important and isn’t easily visible from the front-end I/O rate. Back-end peaks will likely be at a totally different time from the front-end peak, but possibly not even that much lower in terms of number of I/Os. Actual workloads differ significantly between installations; Figures 2 and 3 show some examples of back-end I/Os vs. front-end I/Os.  

 

How does this all relate to balance and performance potential? The back-end operations are done to a particular RAID parity group. If active volumes are placed together on a single RAID parity group, whereas other RAID parity groups contain only inactive volumes, this most busy RAID group may run out of steam before any of the others do. As soon as this most busy RAID parity group reaches its maximum throughput limit, it will start responding slowly and all work to the other volumes on that RAID group will suffer, too. Likewise, an application that accesses one volume on an overloaded RAID group can encounter major performance issues even though most of the volumes it accesses are still fine. Therefore, having even only one highly busy RAID group may cause degraded application response times or longer batch periods. Ultimately, that may affect only a few batch jobs or, for example, it could cause a bank’s Automated Teller Machines (ATMs) to time out.

Therefore, the overall throughput potential of a disk storage system greatly depends on the balance you can achieve between the parity groups (see Figure 4). Both charts represent the same workload on the same hardware, but the balanced layout on the right shows a peak of 540 back-end I/Os instead of the 900 I/Os for the busiest RAID array on the left-hand side. This means the box could achieve a 66 percent higher throughput if everything was balanced evenly. The difference between the left and right chart is that the left-hand chart is the current situation and the right-hand chart shows the situation that would be achieved if the volumes had been placed to achieve the best balance possible.

 

A heat map is a useful tool for viewing the workload at the parity group level (see Figure 5). You will need a software package to determine the activity for each parity group for a prolonged period, and, with this, you can plot the activity over time. In a heat map, a hotter color (orange to red) indicates an overloaded parity group for a particular time.

Cache Resources

It may not be intuitively clear how imbalance can impact cache usage. Let’s consider how.

Storage systems are provided with large amounts of cache memory to achieve a high number of read hits. However, cache isn’t just used for reads. Writes are also done to cache, and they even take priority over reads. Writes will tend to fill up the cache if they can’t be de-staged, causing a lower ratio of read hits than would be expected with the configured cache memory. So, despite large sizes, cache memory available for reads can be significantly reduced when there are bottlenecks in the storage configuration that delay the de-staging of writes from cache to disk. Ultimately, a “FW bypass” condition may occur, where a write operation is forced to wait until de-staging occurs before it’s acknowledged as completed to the host.

Since a FICON channel can send random write data much more quickly than a spinning disk can accept it, it’s quite possible to create a workload that will cause the writes to fill up the cache. In practice, those problems are most likely to occur in combination with flashcopy or shadow image technologies that require additional back-end operations for each new write.

While you may view decreasing read hit ratios and increasing FW bypass rates as a sign that there’s no longer enough cache, the real reason is that one or more of the back-end arrays can’t handle the de-staging load. Usually, it’s only a small number of arrays that are in trouble, so the easiest, cheapest, and most effective solution is to simply make sure random write activity is well-spread across arrays.

For replicated environments, you must take the back-end of the secondary storage system into account, too. Any write done on the primary system must also be done on the secondary. The secondary system therefore needs to be able to de-stage the requests in time to prevent the secondary cache from filling up with writes. If the secondary can’t keep up, new writes from the primary will be delayed and they will start to fill the cache on the primary side. This is why you must be particularly careful when deciding whether to select a more economical disk type on the secondary system compared to the primary.

Techniques to Optimize Throughput

There are several techniques to achieve a better balanced system with more throughput. Let’s review the major ones:

Using a combination of these techniques, you will be able to create a well-balanced system and get more throughput and performance from your system without much effort. You may even be able to use higher-density disks or move from RAID 10 to RAID 5 without a performance penalty.

Summary

The way a storage configuration is balanced greatly influences its throughput and responsiveness. If there’s an imbalance between the components, delays can occur even though the hardware itself would be capable of handling the workloads. Using smart storage performance management techniques to achieve a well-balanced system can yield impressive results in both throughput and response times. With the right balancing efforts and software tools, storage hardware purchases may be postponed, saving a lot of money. If you manage storage performance wisely, it will directly translate into increased user satisfaction and lower hardware costs.