In late 2011, IBM's premier virtualization product for System z debuted a new feature, z/VM Single System Image (SSI), which dramatically improves the horizontal scalability of z/VM workloads. Available in z/VM Release 6.2, SSI opens the door to easier, more flexible horizontal growth of large-scale virtual server environments by clustering up to four z/VM systems—each capable of running hundreds of virtual servers—in a rich, robust, shared-resource environment. Coordination of access to shared devices and network connections between members, as well as a common repository of virtual server definitions, allows workload to be spread seamlessly across the cluster. Capabilities such as multiple-system installation, a shared software service repository, and single-point-of-control administration and automation reduce systems management cost and effort. Capping all these features is Live Guest Relocation LGR), the ability to move running Linux virtual servers (guests) from one member to another without disruption, to redistribute workload and provide continuous operation through planned system outages.
An SSI cluster comprises up to four z/VM member systems, running on the same or different System z machines, interconnected through shared networks and through access to shared disk storage (see Figure 1). Two types of connectivity are employed. A common Ethernet LAN segment supports traffic among the virtual servers and to the outside world. Channel-To-Channel (CTC) adapters flow system control information among the z/VM member hypervisors, which are driven by a significantly enhanced Inter-System Facility for Communication (ISFC) component in z/VM. The improved ISFC can group up to 16 CTCs into a logical link, providing high bandwidth, reliable transport of system data for functions such as coordination of member states, negotiation of resource access, and LGR. Most disk storage is shared throughout the cluster to ensure that both virtual server data and system metadata are accessible from all member systems. Optionally, a common Fibre Channel Protocol (FCP) Storage Area Network (SAN) allows access to Small Computer System Interface (SCSI) devices from all hosts and guests throughout the cluster.
Safe, Controlled Resource Sharing
The ability to distribute workload as desired across the SSI cluster depends on uniform access to numerous system resources:
• A common system configuration (SYSTEM CONFIG) file defining the SSI membership, CTC connections, system volumes, virtual network switches, and other attributes for each member in the cluster. Most of these attributes are specified once to apply to all cluster members. The syntax allows specification of system-specific characteristics where needed; for example, to define each member's paging and spool volumes.
• A common user directory defining all the virtual machines, their CPU and memory configurations, privileges, and the real and virtual I/O devices to which they have access. Changes to this directory are propagated throughout the cluster by a directory manager product such as the IBM Directory Maintenance Facility (DirMaint).
By default, virtual servers (guests) defined in the directory can be instantiated (logged on or autologged) on any one of the member systems; they will automatically gain access to the resources defined for them in the directory. To prevent a duplicate presence in the network or conflicting access to a guest's resources, the instantiating member confers with other members to ensure the guest isn’t already running elsewhere. Management and service virtual machines that need to operate on each member of the cluster can be exempt from this restriction, as we will explain.
• Disk volumes defined for z/VM system use (CP-owned volumes) are tagged with the name of the SSI cluster and, if appropriate, the owning member. This ensures that configuration errors, such as naming the same paging volume for multiple systems, won’t result in corruption of one member's data by another.
Volumes containing virtual server data (full packs or minidisks) are generally accessible across the cluster. The minidisk “link mode” semantics that z/VM uses to govern concurrent read or read-write access to these virtual disks are now enforced throughout the cluster. In addition, z/VM's Minidisk Cache (MDC) function no longer must be disabled for shared volumes; rather, each member disables and re-enables MDC automatically on each minidisk as write access on another member is established and removed. For shared minidisks that are seldom updated, this allows all members to benefit from caching safely when the contents aren’t changing.
• Spool files created on any member are accessible on all other members. Each member owns a separate set of spool volumes on which it allocates files. The replication of spool file metadata among members allows a guest on one member to access all files in its queues, even files residing on volumes owned by another member.
• Virtual network switches (VSWITCHes) can be defined in a single place—the system configuration file—and these definitions (name, backing physical network device, and attributes such as network type and VLAN options) apply across all the members. Traffic is routed transparently among the members to give the appearance of a single VSWITCH across the cluster. This allows guests connected to the same VSWITCH to interact seamlessly regardless of the members on which they’re running or to which they’re
Media Access Control (MAC) addresses assigned to virtual network devices are managed across the cluster. The system configuration defines a separate range of addresses from which each member will allocate. Guests carry their addresses with them when they’re relocated. Since the member that assigned the address may have been re-IPLed and “forgotten” prior assignments, an address to be allocated is first broadcast to the remaining members to ensure there’s no conflict.