When Kdump was released, the IBM Linux on System z team considered adopting that dump method, but rejected it due to reliability concerns.
The IPL mechanism on System z performs a hardware reset on all attached devices. So the dump tools can work with fully initialized devices. An IPL to start the dump process will always work, even if CPUs are looping with disabled interrupts. Like Kdump, the System z dump tools are independent of the state of the first kernel; however, the System z dump tools don’t share memory with the first kernel, so there’s no way to overwrite the code of the tools, as can happen for Kdump. Another advantage of the System z dump tools is that they don’t require reserved memory. This is especially important under z/VM with many guests.
The main disadvantage of the System z tools is that they’re different from Kdump, which is used on most other platforms; this makes them unfamiliar to many. Installer dump support under Red Hat and SUSE Linux is limited for System z. Kdump also has filtering mechanisms for dumping only kernel pages that are important for dump analysis, reducing dump size and dump time.
Dump Analysis Tools
After a kernel dump has been created, it must be read by an analysis tool for problem determination. Two dump analysis tools are available for Linux, lcrash, and crash. The lcrash tool is part of the LKCD project and isn’t being actively developed; crash, developed by a company called Mission Critical Linux and now maintained by Red Hat, will probably be the Linux dump analysis tool of the future.
The kernel dump analysis tools support many commands:
- Show memory contents
- Print kernel variables
- Show kernel log
- List Linux processes
- Show kernel function backtrace for processes
- Show disassembly of kernel code.
A Simple Dump Analysis Scenario
Let’s consider how crash is used. The sleep program is started (this is an example only); then a dump is created, Linux is rebooted, and the dump is opened with crash. Apart from the dump file, crash normally needs two additional files: vmlinux and vmlinux.debug. These contain kernel symbol addresses and the datatype description, respectively. In some distributions, these two files are merged. For our example, the following steps have been performed:
- Start sleep program: /bin/sleep 1000.
- Create DASD dump (/dev/dasdd1).
- Reboot Linux system.
- Copy dump: zgetdump /dev/dasdd1 > dump.s390.
- Start crash tool: crash /boot/vmlinux /usr/lib/debug/boot/vmlinux.debug dump.s390.
Figure 7 shows all processes in the dump as well as the sleep process. The Process Identifier (PID) of the sleep process is 26735. The parent of the sleep process is the bash shell process with PID 26617 (see Figure 8). The sleep process has executed the system call “nanosleep”; the top-most function on the stack is “schedule” (the Linux function where all processes normally sleep until the scheduler wakes them up again).
This article described the history of Linux dump methods. After Linus Torvalds rejected dump methods such as LKCD, the Kdump method was finally accepted in the mainline kernel. On System z, architecture-specific dump tools existed several years before Kdump, and remain in use. These include standalone dump tools for DASD and channel-attached tapes, a dump tool for ZFCP SCSI disks, and the hypervisor dump method, VMDUMP. The main advantage of these System z dump tools is reliability.