Storage systems are provided with large amounts of cache memory to achieve a high number of read hits. However, cache isn’t just used for reads. Writes are also done to cache, and they even take priority over reads. Writes will tend to fill up the cache if they can’t be de-staged, causing a lower ratio of read hits than would be expected with the configured cache memory. So, despite large sizes, cache memory available for reads can be significantly reduced when there are bottlenecks in the storage configuration that delay the de-staging of writes from cache to disk. Ultimately, a “FW bypass” condition may occur, where a write operation is forced to wait until de-staging occurs before it’s acknowledged as completed to the host.
Since a FICON channel can send random write data much more quickly than a spinning disk can accept it, it’s quite possible to create a workload that will cause the writes to fill up the cache. In practice, those problems are most likely to occur in combination with flashcopy or shadow image technologies that require additional back-end operations for each new write.
While you may view decreasing read hit ratios and increasing FW bypass rates as a sign that there’s no longer enough cache, the real reason is that one or more of the back-end arrays can’t handle the de-staging load. Usually, it’s only a small number of arrays that are in trouble, so the easiest, cheapest, and most effective solution is to simply make sure random write activity is well-spread across arrays.
For replicated environments, you must take the back-end of the secondary storage system into account, too. Any write done on the primary system must also be done on the secondary. The secondary system therefore needs to be able to de-stage the requests in time to prevent the secondary cache from filling up with writes. If the secondary can’t keep up, new writes from the primary will be delayed and they will start to fill the cache on the primary side. This is why you must be particularly careful when deciding whether to select a more economical disk type on the secondary system compared to the primary.
Techniques to Optimize Throughput
There are several techniques to achieve a better balanced system with more throughput. Let’s review the major ones:
- Configuration of the storage system hardware: Spread logical volumes across more physical disks in one or more RAID parity groups. The larger the group, the more likely it is the work is more evenly spread. That’s why a RAID 10 configuration with eight disks in a parity group will perform better than a RAID 1 configuration, why 28D+4P provides a better balance than 7D+P, and why storage pool striping works well.
- Design of the SMS configuration: Use a storage configuration with “horizontal storage pools” across both parity groups and Library Control Units (LCUs). This way, z/OS and Data Facility System Managed Storage (DFSMS) load balancing tends to spread work across all parity groups.
- DFSMS features: Use software striping for highly active data sets so the work is spread over multiple, logical volumes in a storage group and most likely over multiple, physical disks. With just four stripes, you already have four times as many physical drives working on the I/Os, and the peaks are going to be much lower. Note that striping can be just as effective for a random access data set as for sequential access.
- Tuning: Actively tune the configuration by moving volumes away from “hot” RAID parity groups. Most installations do this with a manual review process, but this is a difficult task because of the many factors that must be considered. Existing software can recommend which volume moves are the best ones if you want to achieve and maintain a balanced configuration.
- Smart layout: When moving to new hardware, it’s important to make the layout as balanced as possible. For instance, distribute all FICON links and remote copy links as evenly as possible over all the host adapter cards, and spread the volumes over the RAID parity groups in a way that optimizes the workload balance. Again, it’s a tedious, difficult task to do this manually, but software can be used to find the optimal mapping for volumes over RAID parity groups.
Using a combination of these techniques, you will be able to create a well-balanced system and get more throughput and performance from your system without much effort. You may even be able to use higher-density disks or move from RAID 10 to RAID 5 without a performance penalty.
The way a storage configuration is balanced greatly influences its throughput and responsiveness. If there’s an imbalance between the components, delays can occur even though the hardware itself would be capable of handling the workloads. Using smart storage performance management techniques to achieve a well-balanced system can yield impressive results in both throughput and response times. With the right balancing efforts and software tools, storage hardware purchases may be postponed, saving a lot of money. If you manage storage performance wisely, it will directly translate into increased user satisfaction and lower hardware costs.