Operating Systems

Global demands for continuous system uptime and availability are raising the bar for performance in corporate data centers. It’s also causing organizations to revisit their plans for system backup and Disaster Recovery (DR). Essentially, organizations are looking to improve three things:

• Uptime and availability
• Data backup and recoverability
• Anticipation of emerging problems so as to preempt the problems.

As enterprises go about this task, some profound changes are occurring in how they think about DR and backup. These changes demand new strategic thinking about system uptime and availability and how resources and facilities can be deployed and exploited to increase system uptime.

DR and Backup: Onto the Front Burner

Organizations that operate 24x7 vary widely in their approaches to DR, depending on their industry sector. Financial services companies are the most aggressive. This is partly due to rigorous regulatory standards, but financial services companies also operate in an environment where you can palpably hear the downtime ticking in seconds and dollars. In contrast, a parts manufacturer also feels the impact of downtime, but perhaps not in seconds. A retailer is sensitive to downtime, but usually has a distributed network of small servers in each of its retail outlets so it’s possible to store transactions locally while central computing is offline, and then forward those transactions when central computing is restored. In short, the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) standards for different industries vary, but everyone is interested in recovering from disasters, ensuring recoverability of data, and being able to perceive and resolve a problem before it becomes an issue.

Virtually every organization also understands that being “open” for business non-stop is now the minimal expectation in a global marketplace. This means system uptime and availability must be next to infallible; it changes the dynamics of DR and backup from a “side” or “background” project to one of the top priorities on CIOs’ to-do lists.

The push for this change in perspective is coming from the business, which now considers any kind of IT downtime as an impact to revenue capture and customer satisfaction and retention. While every organization approaches DR and backup based on its own unique situation, virtually every organization wants system redundancy (and continued uptime) for planned actions such as system maintenance, and for seamless and automated failover for an unexpected downtime situation or disaster. Enterprises also want a minimal number of system components in the mission-critical path of business processing; they want to avoid single points of failure in their IT infrastructures.

Data Center Changes

The baseline practice has always been to go with a single data center and then contract for hot site or cold site services in the event of an outage. However, a growing number of enterprises now have two data centers. As continuous system availability and uptime have grown in importance, the trend in enterprises has also been to fully equip each data center so it can run all of production, and to insert technology that supports easy, transparent switchover of processing and other IT resources from one data center to the other when necessary. The second data center can operate in an active-standby mode, where it’s on constant “standby,” and is activated to assume the full production load during an outage; or it can operate in an active-active mode, where the second data center stays in synch with the primary data center and both data centers are processing in parallel so there’s literally no lapse for a failover or DR.

The two-data center concept operates especially effectively when both data centers are located in the same metropolitan area. Such proximity allows the use of communications topologies that can adroitly failover from one site to the other. The location of two data centers that are proximate to each other can also take advantage of a central pool of IT talent that can operate out of either data center. Moreover, two data centers are insurance for an enterprise that it’s going to keep running in the event of an outage.

With global activities increasingly ubiquitous, new data center thinking, which calls for a third data center that the enterprise operates in a remote geography, is rapidly gaining traction. The risk enterprises want to address involves what could happen if a sizable disaster brought down an entire geographical region—including all the data centers in that region. Some of this drive for the third data center is being fueled by industry regulators. In the financial services industry, for example, there’s growing regulatory pressure to at least keep data at locations that are significantly geographically removed from major data center sites. The most compelling pressure, however, appears to be the non-stop service expectations of a now global community of customers. They expect service even if your main operation is knocked down by a regional disaster. In a situation such as that, the third data center in a distant geographical locale keeps an enterprise in business.

There’s also a trend toward fully equipping each data center to run the entire business. This level of redundancy requires enterprises to move away from cold site or “standby” thinking and into dynamic processing environments where the full production load can be toggled between data centers on demand. In this multi-production data center model, data center A might run all enterprise production for the first quarter of the year, with production moving over to data center B for the next quarter. By toggling production back and forth between data centers, an enterprise has the peace of mind of total system resiliency and redundancy. It simultaneously ensures its DR, backup, and failover plan works continuously.

2 Pages