IBM has a dedicated kernel development team that codes new functions for Linux on System z. There are three main areas for these development activities. First, many new features for the upstream Linux kernel are added by the community, some of which require architecture-specific back-ends. Second, our team tries to support features of the new System z hardware as early as possible. Third, Linux on System z is improved so it works better in the special hypervisor environments z/VM and LPAR.
This article examines a sampling of new features for Linux on System z developed in the last year: Large Page support, CPU topology support, DASD HyperPAV, and the new z/VM unit record device driver (vmur).
Large Page Support
IBM System 370, announced June 30, 1970, was one of the first commercial computer architectures with full support for Dynamic Address Translation (DAT). DAT was introduced because real memory was an expensive resource that had to be shared between multiple processes. Using DAT, real memory is simulated by the computer and each process runs as if the full memory were at its disposal. To accomplish this, the memory is virtualized. Memory is segmented into small chunks called pages. Normally, one page is 4KB. Page tables map the virtual memory to real memory.
DAT isn’t free. The page tables themselves must be stored in memory and therefore consume some of it. The translation process consumes CPU time. To increase the speed of the translation process, Translation Look-aside Buffers (TLBs) were invented. A TLB has a fixed number of slots containing entries that map virtual addresses to physical addresses. Normally the last translated addresses are kept in the TLB in the hope that programs are local and a memory region is accessed several times.
Since 1970, the memory capacity of computers has dramatically increased. The first memory chips had a capacity of 1,024 bytes. Today’s memory chips can store more than a million times more information. Interestingly, the page size for DAT and the TLB size haven’t been changed in the same manner. Most computer systems still use 4KB chunks for their page tables. If the page size were increased, many resources could be saved, including memory needed for the page tables and CPU time for the address translation.
The new IBM z10 architecture supports large pages with a size of 1MB to solve these problems. The System z operating systems must exploit this feature. Linux is already prepared for large pages and we needed to implement only the System z-specific, back-end code. However, the Linux on System z memory management in the current distributions requires setting the large page size to 2MB. So, two contiguous hardware large pages must be used for one Linux 2MB page.
Currently, two mechanisms are available to work with large pages in user space: a virtual file system named “hugetlbfs” and the System V Inter- Process Communication (IPC) shared memory segments. The hugetlbfs file system is a virtual file system that’s backed by large pages. When mounted, you can allocate large page memory through mmap() system calls on files within the hugetlbfs files system. The System V shared memory system calls also can be used to allocate large page memory without mounting the hugetlbfs file system.
On z10 hardware in a Logical Partition (LPAR), the Linux large page support will be backed by hardware large pages. On older hardware and in z/VM environments, large pages will be emulated in software via contiguous standard 4KB pages. The large page size will be 2MB in both cases. In software emulation mode, the page tables will be shared for processes accessing the same large page memory.
There are two advantages when using large pages instead of the normal 4KB pages: