Operations

The article “Architecting for a Private Cloud: From Caching Framework to Elastic Caching Cloud Platform Service” (available at http://entsys.me/f2ivm) discussed an elastic caching service for private clouds. Interestingly enough, the majority of questions and feedback we received were actually related to the subject of private clouds rather than elastic caching. The spectrum of views and opinions was remarkably wide, including suggestions that private clouds don’t really exist; i.e., clouds can only be public. This article explores the subject of private cloud as well as the drivers and technologies behind it, and examines two architectural models for establishing a private cloud.

Workload Characteristics Define Your Cloud

Fundamentally, in the IT world, it’s all about the workload, its characteristics and requirements. Workload describes work that an application or application component perform, the load placed on the underlying system infrastructure. There are many types of IT workloads, which have different characteristics:

• Traditional relational database server workloads that store data on disk, thus requiring dedicated, high-performance I/O, reliable storage and data sharing for high availability. Usually scales vertically. DB2 and Oracle are among the top relational database management systems (RDBMSes).
• Mission-critical, online, high-volume transaction (online transaction processing [OLTP]) workloads with strict quality of service (QoS) characteristics such as predictable response time, high or continuous availability and ACID (atomicity, consistency, isolation and durability) properties. Typically used in conjunction with an RDBMS.
• Web and mobile workloads written using HTML5, JavaScript and server-side languages, such as Java or Ruby, that scale horizontally with the load changing over time.
• No-SQL workloads that store data without using a relational data model on share-nothing, horizontally scalable system infrastructures. Here data is equally distributed between independent servers by the process called sharding, which allows large volumes of data to be stored and scale up/down easily. Data is often automatically replicated so it can be quickly and transparently replaced with no application disruption. Examples of this workload are in-memory caching products such as IBM eXtremeScale and document stores such as MongoDB.
• Hadoop and the rest of the family (MapReduce, Hive, Pig, ZooKeeper), which are workloads based on horizontally scalable, inexpensive, distributed file system architecture with built-in fault tolerance and fault compensation capabilities.

Every type of workload has specific scalability, performance, reliability and security characteristics that the underlying infrastructure must accommodate, but we can identify two primary types:

• Workloads that rely on one or a few large nodes that scale vertically by adding resources to the nodes; i.e., scale-up architectures, where the underlying infrastructures are responsible for fault tolerance and high availability. These workloads are often stateful and frequently dependent on shared storage architecture.
• Distributed workloads that scale horizontally by adding additional nodes; i.e., scale-out architectures. These workloads are often stateless and fault tolerance is built into the software. The strategy of providing scalability and fault tolerance is arguably the major differentiator between the workloads since it has a major impact on the architectures of the underlying system infrastructures.
  
In traditional data centers, reliability is typically achieved by implementing active-passive, active-active or N+1 redundancy and using enterprise-grade hardware that detects and mitigates hardware failures. The foundation of traditional data center high-availability engineering is focused on providing redundant hardware: Clustered servers, data replication to ensure consistency between redundant servers, redundant network cards and RAID disk arrays are techniques to provide redundancy to possible points of failure in the system. When it comes to database high availability, particularly for transaction processing systems, enterprises favor shared-disk architecture; for example, Oracle Real Application Clusters (RAC) with shared cache and shared disk architecture, Parallel Sysplex technology supporting data sharing for DB2 z/OS and IBM’s PureScale with its cluster-based, shared-disk architecture. To further mitigate the impact of hardware failures, the virtualization platform/hypervisors offer a range of mechanisms such as automated virtual restart and virtual machine relocation; these include VMWare VMotion and Live Guest Relocation (LGR) in the z/VM world. 

When it comes to public cloud providers such as Amazon Web Services (AWS), the picture changes. AWS, overwhelmingly the dominant vendor of the Cloud Infrastructure as a Service (Cloud IaaS) market, according to Gartner, approaches the resiliency problem quite differently. Werner Vogels, Amazon’s CTO, is often quoted as saying: “Everything fails all the time.” AWS infrastructures integrate the entire solution: hardware, software and data center designs that don’t provide traditional redundancy. An application running on AWS should expect failure of hardware and failure of storage. In an AWS architecture, software must be resilient and able to distribute load across loosely coupled compute, network and storage nodes. SQL database services are also focused on a share-nothing, scale-out architecture that leverages the principles of distributed computing to provide scale while maintaining compliance with ACID, SQL and all the properties of an RDBMS. For many people, this model—scale-out, ready-to-fail, distributed, partitioned logic and data applications running on system infrastructures comprised from commodity-grade hardware without traditional resiliency and redundancies—constitutes a cloud architecture.

Of course, there are other public clouds as well. VMWare vCHS (vCloud Hybrid Service), despite its name, is a public cloud service. VMWare focused its public cloud on providing the same high-availability, service-level agreements (SLAs) that exist in traditional data centers, so you don’t have to rewrite or rearchitect existing applications to ensure their availability. In other words, vCHS seems to focus on supporting traditional enterprise workload using traditional resiliency patterns. 

Enterprise private clouds are more likely to look similar to the VMWare vCHS offering rather than AWS. Typically, these clouds run on virtualized, enterprise-grade hardware, network-attached storage and redundant network devices. They may offer services on-demand, enabled via user portal. However, according to Forrester, only 25 percent of enterprises do (see “Four Common Approaches to Private Cloud” by Lauren Nelson under “Resources”). Clearly, enterprise private clouds don’t offer the premise of unlimited capacity. Not many private clouds can boast of abstracted storage, network, security, load-balancing and full-stack automated provisioning. Most important of all, typical enterprise clouds are focusing on supporting enterprise workload with scale-up characteristics that require high-availability/disaster recovery (DR) from the underlying infrastructure.

Since we have two different architectural models for building system infrastructures targeting different types of workloads, the opinion that “enterprises don’t have clouds” has some grounds. It points out the differences between a dominant public cloud provider’s model and an enterprise model, which is based on data centers with system infrastructures that provide resiliency technologies and a cloud-like, self-service delivery model. One of the root causes for this is traditional enterprise workloads, particularly common in the financial industry, that tend to require massive, I/O-bound OLTP processing with ACID qualities. Additionally, it may not be visible from the outside, but enterprise workloads also include third-party applications that are essential to support core business functions. The architecture of those third-party applications can’t be described as forward-thinking by any stretch of the imagination, but, nevertheless, to support business requirements, private enterprise system infrastructures must provide efficient hosting for these workloads. 

Another aspect that often muddies the waters around private clouds is a phenomenon commonly known as “cloudwashing,” a conscious or unconscious practice to provide ambiguous descriptions about capabilities associated with cloud technologies. Avoiding cloudwashing, accurately describing the capabilities enabled in the enterprise data centers, is highly desirable. Announcing to business partners or potential customers that “we have cloud services,” while in fact delivering something quite different, hardly helps establish trusting communications and improve future relationships.

4 Pages