Operating Systems

Oct 12 ’12

In previous columns, I’ve referred to several changes coming up through the Fedora “experimental test” environment into Red Hat Enterprise Linux (RHEL). One of these affects the initial RAMdisk construction method used to help in the boot process, incorporating a new tool called Dracut. RHEL 6.3 and its derivatives incorporate Dracut, and it changes the operation of a RHEL-based system start-up in several ways.

To review a bit, at boot time Linux loads the kernel, a parameter line, and an initial RAMdisk (“initrd”) file containing a memory image of a filesystem containing enough commands and configuration files to allow the kernel to find the root file system. This process can be time-consuming, and on non-System z platforms tends to be the limiting factor in how quickly a system can be booted or rebooted; this is due to the process of testing the environment and loading appropriate device drivers and configurations (System z has so few device types that this usually isn’t a problem for us). It’s possible to customize this initial image, and most of the distributors do a fair amount of customization to get things going with various device types and configurations for individual virtual and physical machines.

Unlike the existing process, Dracut is an attempt to have a completely generic initial RAMdisk with little hard-coded into the initial RAM filesystem. The initial RAM filesystem loaded from the initrd file has basically one purpose: locating and getting the root filesystem mounted so we can transition to the real root filesystem and continue start-up. Instead of scripts hard-coded to test for certain devices and do various things, Dracut depends on the userspace device functions (udev) to create device nodes at device discovery time and configure them when found, searching for the real root filesystem device node. When we have the root filesystem’s device node, the kernel mounts it and carries on using the information on the real root filesystem. This event-driven approach helps limit the amount of time required in the initial RAM filesystem code, loading and executing only what’s needed in the machine configuration instead of testing every possible device even if there aren’t any devices that match the script. This approach makes boots of less than 5 seconds possible and helps simplify use of non-Extended Count Key Data (ECKD) disks as boot volumes on System z.

Most of the initial RAMdisk generation functionality in Dracut is provided by several generator modules that are sourced by the main Dracut script to install specific functionality into the initial RAMdisk. The modules live in the “modules” subdirectory and use functionality provided by Dracut functions to do their work. A presentation by Harald Hoyer of Red Hat at www.harald-hoyer.de/personal/files/dracut-fosdem-2010.pdf describes Dracut in detail.

So, how does this apply to us in the System z world? In general, it means a couple of things: fewer runs of zipl (in that less information is hardcoded into the initial RAMdisk), and it’s now realistic to think much harder about IPLing Linux from a Named Shared System (NSS) instead of disk. By using the dynamic configuration capabilities of Dracut, the need to run zipl to capture changes to disk configurations and other start-up parameters is dramatically reduced. Most of the information stored in the initial RAM disk image in the current environment can be determined by Dracut; perhaps in future releases, it will no longer be necessary to remember to run zipl after disk changes. Our testing showed a few bugs yet, but it’s definitely better than what we had before.

Also, the idea of IPLing from an NSS is particularly tempting, as it allows use of Internet Small Computer System Interface (iSCSI)-based disk as root filesystems; converged storage and data networking have provided a significant push in data center and enterprise network design, and IBM’s current method of managing Fibre Channel Protocol (FCP) disk storage on z/VM still isn’t well-integrated into the z/VM environment (i.e., the major directory managers still don’t support it well). iSCSI-based storage allows a number of optimizations, removing some of the limits on the number of paths to a storage device, and in a Single System Image (SSI) cluster environment, the restrictions on device addresses and placement no longer apply. Most storage vendors support iSCSI, and use of iSCSI storage is rapidly outpacing traditional FCP storage on distributed hosts. A little experimentation shows it’s possible to easily create a shared IPLable Linux NSS that has no ECKD or FCP disk and can be migrated easily between SSI hosts in a z/VM 6.2 environment.

All these positives aside, creating and managing a system based on Dracut is different. The presentation mentioned previously describes many of the differences; it’s worth reading to understand what’s coming next.