IT Management

The Aberdeen Group recently conducted an in-depth analysis of factors surrounding data center downtime. Aberdeen used the data from the survey to identify the top performers in reducing downtime. They identified three categories:

• Best in class: Top 20 percent
• Industry average: Middle 50 percent
• Laggards: Bottom 30 percent.

Best in class performers recorded less than one business interruption over the preceding 12 months, averaged only six minutes of downtime per event, and took less than one hour to restore 90 percent of business operational functionality after the last interruption.

Average performers recorded 2.3 business interruptions over the last 12 months, averaged one hour of downtime per event and took two hours to restore 90 percent of business operational functionality after the last interruption.

Laggards recorded 4.4 business interruptions over the last 12 months, averaged nine hours of downtime per event and took 11 hours to restore 90 percent of business operational functionality after the most recent interruption.

Aberdeen had examined the cost of an hour of downtime in two prior studies performed in 2010 and 2012. They found that the average cost of an hour of downtime increased by 38 percent during the two years between their studies (Aberdeen Group Research Brief Data Center Downtime: How Much Does It Really Cost?). This is cause for concern, even more so given the findings presented in the 2013 Ponemon Institute study on the cost and causes of data center outages: 91 percent of data centers experienced an unplanned data center outage between 2011 and 2013.

The question is why? Three other smart organizations offer some insight:

• The IT Process Institute’s Visible Ops Handbook reports that 80 percent of unplanned outages are due to configuration change and management errors made by staff/administrators.
• Gartner: “Through 2015, 80 percent of outages impacting mission-critical services will be caused by people and process issues,” and more than 50 percent of those outages will be caused by change configuration, integration and management issues.
• Enterprise Management Association (EMA): “60 percent of availability and performance errors are the result of misconfigurations and management issues.”

In other words, misconfiguration and management difficulties are the key contributing factor to the occurrence of, and the duration of unplanned data center outages. And when it comes to the networks used for disaster recovery/ business continuity and continuous availability, guess what most end users in the enterprise space (System z end users) are doing today? You have a kluge of management tools, management tool servers and network hardware from multiple vendors. These are often purchased with the idea of having a multivendor strategy to drive down costs from vendors—which is fine; those strategies work for some things. However…

The fact of the matter is that for some things, a single vendor, a.k.a. “one throat to choke” strategy works better. Mission critical components of your environment, such as your replication networks, are one of those things. And vendors are not ogres. A good relationship with end users and vendors is not adversarial (us vs. them). It’s a partnership where you have access to the vendor’s best technical talent—a trusted advisor relationship. You also get good pricing.

A “one throat to choke” approach to your replication network infrastructure eliminates the complexity in the management tools. Your personnel will be able to do their jobs better. It will help reduce those unplanned outages caused by misconfiguration and/or management mistakes. It will put you on the path to becoming one of those Aberdeen Best in Class performers.

Nobody likes to be a laggard!