The key observation is that the back-end I/O rate is important and isn’t easily visible from the front-end I/O rate. Back-end peaks will likely be at a totally different time from the front-end peak, but possibly not even that much lower in terms of number of I/Os. Actual workloads differ significantly between installations; Figures 2 and 3 show some examples of back-end I/Os vs. front-end I/Os.
How does this all relate to balance and performance potential? The back-end operations are done to a particular RAID parity group. If active volumes are placed together on a single RAID parity group, whereas other RAID parity groups contain only inactive volumes, this most busy RAID group may run out of steam before any of the others do. As soon as this most busy RAID parity group reaches its maximum throughput limit, it will start responding slowly and all work to the other volumes on that RAID group will suffer, too. Likewise, an application that accesses one volume on an overloaded RAID group can encounter major performance issues even though most of the volumes it accesses are still fine. Therefore, having even only one highly busy RAID group may cause degraded application response times or longer batch periods. Ultimately, that may affect only a few batch jobs or, for example, it could cause a bank’s Automated Teller Machines (ATMs) to time out.
Therefore, the overall throughput potential of a disk storage system greatly depends on the balance you can achieve between the parity groups (see Figure 4). Both charts represent the same workload on the same hardware, but the balanced layout on the right shows a peak of 540 back-end I/Os instead of the 900 I/Os for the busiest RAID array on the left-hand side. This means the box could achieve a 66 percent higher throughput if everything was balanced evenly. The difference between the left and right chart is that the left-hand chart is the current situation and the right-hand chart shows the situation that would be achieved if the volumes had been placed to achieve the best balance possible.
A heat map is a useful tool for viewing the workload at the parity group level (see Figure 5). You will need a software package to determine the activity for each parity group for a prolonged period, and, with this, you can plot the activity over time. In a heat map, a hotter color (orange to red) indicates an overloaded parity group for a particular time.
It may not be intuitively clear how imbalance can impact cache usage. Let’s consider how.