We have finished looking at the approach of protection mechanisms for primary disk storage. As I discussed, the most common practice is to create physical copies of disk to either protect against a primary failure (the proverbial piano that falls on your disk subsystem) and corruption protection. Many users have elected to implement systems that protect using redundant copies of data on expensive disk because they felt they had no choice but to provide the protection required to match the value of their applications’ well-being. While that may have been true not long ago, there are less expensive alternatives available today.
We all know and believe that a primary RAID 1 copy of data on disk can be re-established for recovery. This brings us to a very important point. Replication of data is a critical consideration, but it is generally the easy part of the problem. Anyone who is responsible for the uptime of a system knows the tricky part is re-creating a system or application. Going forward, we are going to focus on the re-creation side of things, both technically and economically.
It generally takes about two seconds per terabyte to recover from a RAID 1 physical mirrored copy. There are many ways you can do this, as well as a range of hardware and software products that can make it happen. One alternative to the most common way of purchasing extra RAID 1 physical disk is to use a virtual disk subsystem. In the virtual disk subsystem, there is a copy of the primary volume that is maintained, but it doesn’t occupy a one-for-one amount of physical storage unless you want it to. This gives you the same recovery capability, but at about a third of the price. That takes care of the piano falling on the first system problem, but what about corruption?
The virtual system can also keep consistency checkpoints. The main difference is that the volume checkpoint is virtual, not real. What that means to you is that the amount of physical storage purchased goes from 8TB to 1TB, and you still can have enough extra space to have 24 hours of protection, again at about one-third of the price. Bear in mind that corruption protection on primary disk storage is often over-played and over-valued for what the real-world problems typically are. One nice extra value of a virtual disk subsystem is that you can have far more checkpoints. Where you would typically have only seven volume checkpoints if you were using physical copies, or one for every three hours, with virtual you have the space to have one once an hour. The value to you is that if you have a point in time that you want to back up to, the granularity is one hour. Then, when you have to apply journals and logs, you will be three times faster.
What about open midrange disk? Protection and re-creation for open midrange disk is conceptually the same. A logical approach to protection and recreation from failure and corruption is required. This is most frequently done through application software, and there are a variety of vendors that provide the services.
The big question is how long does it take to replicate and to re-create, and what are the financial alternatives? Remember, budgets are generally flat or declining, and you need to grow storage roughly 50 percent per year. Therefore, you need to start thinking about this a little differently!
Take, for example, a typical midrange disk subsystem. Let’s say it has a backplane speed of 772MB per second and contains at least eight fibre channel interfaces. For ease of discussion, let’s assume the server is not doing anything else and you have dedicated storage. Let’s examine three examples; the first two are unlikely, but will illustrate a concept used to understand the last example.
Imagine a volume or 50GB. If we wanted to replicate or re-create it, and could drive the backplane at full performance, it would take 1.1 minutes. Not bad. If we decided that the data was critical, and we needed a third copy, we could move it from open midrange disk at about $40 per GB to automated tape at about $1.25 per GB, save some money, and get the job done in about 27 minutes. This assumes one 30MB per second tape drive. Is 27 minutes fast enough for your application? Earlier, I discussed the business impact analysis. Look at it this way: If today is Tuesday, and the application goes down, but it doesn’t need to run until Thursday, is 27 minutes fast enough? By the way, this speed doesn’t consider compression, so really you can divide the time by about three.