Jun 1 ’09
Compiling and Debugging Applications on Linux for System z
Today, many high-level programming languages are in use; all are tailored toward our human way of thinking and, to a varying degree, to the needs of specialized application areas. Any program written in such a language needs to be translated into a machine-executable format. Compilers perform this step and additionally offer many kinds of correctness checks and optimizations.
This article provides an overview of the GNU Compiler Collection (GCC) and some accompanying programs such as the GNU assembler and linker. We’ll begin with the overall information flow between all these programs and also:
• Introduce the Executable and Linking Format (ELF) used to represent machine code
• Look at the tools involved, namely GCC itself, the linker, and the assembler
• Consider the run-time performance of the generated machine code
• Give an overview of improvements achieved in GCC during past years and what options a developer can apply to achieve best performance results
• Review debugging programs.
Because most parts of Linux are written in C and C++, we focus on those languages, although the GCC-based tool chain also supports other important languages such as FORTRAN and Ada.
Overall Information Flow
The process of compiling a source code file comprises several steps (see Figure 1). These tools are involved:
• The preprocessor evaluates a specific set of commands starting with a #. The most important of these commands is used to include other code files (header files) or to define macros that might be used inside the code to avoid duplication. After this step, all the # commands have been removed from the input together with all the comments. If there are no options specified that force GCC to sustain the preprocessor output as a file (e.g., -E or --save-temps), the preprocessing step will occur as an integral part of the source code parsing without ever creating the intermediate .i files.
• The compiler translates a source code file into the architecture-specific assembler language. This step involves parsing and optimizing the code.
• The assembler takes an assembler language file as input and translates it into an ELF object file. Although the GNU Binutils are built to be able to deal with a wide range of different object file formats, only ELF (the most flexible one) is supported on Linux for System z.
• The linker finally takes one or more object files and turns them into either an executable file or shared library. The linker must locate the libraries and ensure that every necessary symbol can be found in one of the involved object files.
The GCC package provides an executable named gcc (the so-called compile driver), the command the user usually calls to start compilation. This driver then calls all other tools, handles intermediate files, and passes all necessary options. The real compiler executables are named cc1 for C and cc1plus for C++. Both are part of the GCC package and are usually invoked by the driver. The GNU assembler (as) and the GNU linker (ld) are distributed as part of the GNU Binutils suite.
Appending the -v option to the gcc command causes the driver to print all the commands it executes. For more on using GCC, visit the GCC homepage: www.gnu.org/software/gcc/gcc.html. The GCC manual also is included in most Linux distributions.
The ELF Binary Object Format
The ELF is the most important object file format used in the UNIX and Linux worlds. First published in the System V application binary interface specification, it later became part of the tool interface standard. A major advantage over other formats is that it can be easily extended without breaking compatibility with older systems.
An ELF file provides two different views to cover different needs at compile time and run-time:
• The GNU linker uses the linking view while creating an executable or a shared library. To do this, the linker needs more detailed information about the file than is necessary for execution. So the linking view provides a finer granularity— down to single sections. All the sections of an ELF file are listed in the section header table. A section always has a name and a type that tell the linker what to do with the section during link step. Standard section names start with a ‘.’ as in .text and .data.
• The dynamic loader (ld.so) uses the execution view to lay out the contents of the file into the process memory. The program header table describes which portion (segment in ELF terminology) of the file must be mapped to which process memory location. A segment might cover several sections simultaneously. Not all sections are needed for execution (e.g., sections with debugging information). Those simply aren’t covered by any segment in the program header table.
Although the ELF file format itself is platform-independent, the content of the sections isn’t, so usually some platform-dependent extensions exist. These specialties mostly comprise the structure of the procedure linkage table and global offset table—both used for dynamic linking.
More information about ELF and object file formats appears in the book Linkers & Loaders by John R. Levine (Morgan Kaufmann Publishers, San Diego, 2000, ISBN 1-55860-496-0).
Architecture-dependent extensions to ELF documentation appear on the Linux Foundation Website. Refer to:
• Linux for S/390 ELF Application Binary Interface Supplement, IBM Document Number LNUX-1107-00, 2001, http://oss.software.ibm.com/linux390/docu/ l390abi0.pdf
• Linux for zSeries ELF Application Binary Interface Supplement, IBM Document Number LNUX-1107-00, 2001, http://oss.software.ibm.com/linux390/docu/lzsabi0.pdf.
The process of translating source code written in a higher-level programming language into machine executable form occurs in three phases (see Figure 2). The first phase analyzes the source code and checks whether it conforms to the programming language specification. Most of the compiler’s error messages and warnings are issued in this phase. The result of the source code analysis is an internal program representation based on attributed syntax trees.
The second phase translates the internal program representation from tree format into a second internal format called Register Transfer Language (RTL). The RTL format is already close to real Assembler code but doesn’t yet contain any machine-specific information. While translating the highly recursive syntax tree representation into RTL, this phase performs several machine-independent code optimization passes.
The third phase is called the backend. It translates the RTL representation into Assembler code. The back-end has information on the target machine’s instruction set and everything needed to perform machine-specific code optimization.
The three phases of the compiler communicate via two data structures for internal program representation, which are (almost) independent of the programming language used as input, and of the Assembler source code to be output. This modular approach allows front-ends for many different programming languages and code generators for many different target architectures to be combined (see Figure 3). The approach is beneficial because:
• Adding a new language only requires writing a new front-end.
• The code optimization and all existing back-ends can be reused, making the new language immediately available on a large variety of machines.
For example, IBM’s Firmware implementation language, PL8, was implemented this way and the existing back-end for System z could be reused. (For details, see W. Gellerich, T. Hendel, R. Land, H. Lehmann, M. Mueller, P.H. Oden, H. Penner: “The GNU 64-Bit PL8 Compiler: Toward an Open Standard Environment for Firmware Development,” IBM Journal of Research and Development, May/July 2004, Volume 48, Number 3/4, pages 543-555, www.research.ibm.com/journal/rd/483/gellerich.pdf.)
Developing a new code generator makes all languages supported by GCC available on the new architecture. This reduces the effort for supporting a new platform and has contributed to the growth of Linux.
Details of the internal structures of GCC are provided in a separate manual available on the GCC homepage. The machine instructions of System z are described in the Principles of Operation document. Refer to:
• ESA/390 Principles of Operation (SA22-7201-07, 2000) http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/dz9ar007
• z/Architecture Principles of Operation (SA22-7832-01, 2000), http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/dz9zr001.
For more on System z-specific parts of GCC, refer to:
• Gellerich et al. “The GNU 64-Bit PL8 Compiler …” (cited earlier)
• Hartmut Penner, Ulrich Weigand: “Porting GCC to the IBM S/390 Platform,” Proceedings of the GCC Developer’s Summit available from www.gccsummit.org/2003/
• D. Edelsohn, W. Gellerich, M. Hagog, D. Naishlos, M. Namolaru, E. Pasch, H. Penner, U. Weigand, A. Zaks, “Contributions to the GNU Compiler Collection,” IBM Systems Journal, 2005, volume 44, number 2, pages 259-278, www.research.ibm.com/journal/sj/442/edelsohn.pdf.
The cc1 or cc1plus compiler generates a text file containing the program code in Assembler. Assembler is heavily machine-specific but still isn’t executable by a CPU. To make this file CPU-consumable, it’s necessary to encode the mnemonic and operands into the machine opcode. This is the purpose of the Assembler (see Figure 4). The resulting byte stream will usually reside inside the .text section of the ELF object file.
In the Linux environment, the Assembler provided by the GNU Binutils suite can deal with many different architectures and object file formats. For System z, the Binutils suite was built to produce code for the s390-elf target.
Note that the Assembler code generated isn’t the same as High Level Assembler (HLASM) mainframers are used to seeing. It’s GNU Assembler Syntax (GAS). The mnemonics are the same, but specification of the operands aren’t.
The linker’s responsibility is to merge several ELF object files to create an executable or library. In the most simple case, this involves appending the several code and data sections to each other and adjusting the referenced addresses accordingly. This can occur at compile-time (static linking), or run-time (dynamic linking):
• Static linking: The object files are merged. The resulting executable or object file contains the code and data of all the input files. However, to keep the resulting binary as small as possible, the linker can skip input files that don’t contain symbols needed by one of the other files. A static library is an archive containing several object files (.o). If the static variant of a library (the .a file) is used, the resulting binary will contain all the code needed from that library and can run without the library on the target system. Although this may be an advantage in some cases, dynamic linking is usually preferred.
• Dynamic linking: To link dynamically, a dynamic shared object (shared library) is needed. These objects are built in a specific way to allow its code section to be relocated to a different position in memory without modifying it. Relocation normally involves the absolute addresses in the code section to be modified to remain correct. This is prevented by building the code using the -fPIC/-fpic option, which forces GCC to not use absolute addresses in the generated code.
During the link step, the linker searches for the symbols needed from a shared library and lists the needed libraries in the resulting binary. But the linker doesn’t actually include code from the shared library into the binary. Either the shared library is present during link time or a newer version must be available on the system where the resulting binary is executed.
Advantages of dynamic linking include:
• The resulting binary is small since it contains only the code of the non-library functions.
• The code section of a shared library can actually be shared between processes running in parallel. This is extremely important, considering standard libraries (such as the GNU C library), which almost every executable needs. Regardless of how many processes are running, the code section of this big library will occupy memory only once.
• If the binary is linked against a shared library, but the code of the library at run-time isn’t used with all the input values to the binary, the library won’t load. Only shared libraries actually needed will occupy memory.
• Bugfixes without relinking. Installing an updated or fixed version of a shared library will affect the next execution of the binary without requiring any other steps.
Compiler Options and Optimization
GCC provides many options to control its behavior. Besides several general options, there are also options specific to System z (see Figure 5). Some of these have an impact on the performance of the generated code. A summary of the performance improvements achieved between 1999 and 2007 (see Figure 6) suggests that it’s of particular value to exploit the System z-specific optimization options. All data given in Figure 6 was drawn from the latest System z model available in the particular year and was normalized. Overlapping measurements were used to scale when the measurements were taken on a new System z model. See Edelsohn et al., “Contributions to the GNU Compiler Collection” (fully cited previously) to learn how the measurements were conducted.
A detailed description of all GCC options is found in the “Using GCC” manual. This manual provides a section exclusively devoted to System z-specific options; we won’t describe these options in detail here. Also, it’s likely that additional compiler options will be provided for new System z models and improvements might be offered for existing models. Check the most recent version of the manual for details.
Two performance-relevant options specify what processor type to generate code for:
• -march=cpu-type: Exploit all instructions provided by cpu-type, including instructions that aren’t available on older models. Code compiled with this option typically won’t execute on older CPUs. See the GCC homepage for a list of supported CPU types.
• -mtune=cpu-type: Schedule instructions to best exploit the internal structure of cpu-type. This option will never introduce any incompatibility with older CPUs, but code tuned for a different CPU might run slower.
When you specify a cpu-type using the -march option, GCC’s default behavior is to perform an -mtune optimization for the same cpu type. However, it’s possible to specify different values for -mtune and -march. If in doubt, consider this strategy:
• Decide what the oldest model is that you need to support. Specify this model as an argument of -march. This will cause the compiler to fully exploit the instruction set of that model. The code is then not executable on older models but will exploit at least a subset of the instructions provided by later models.
• Decide what your most important model will be (i.e., on what model the most workload will be run, or what model your largest customers use). Specify this model as the argument of the -mtune option. This will cause the compiler to order the instructions so the pipelines of the specified model are best exploited. This order may or may not be perfect for other models, but apart from a slight performance impact, the -mtune option will never introduce any incompatibility.
Further System-z-specific code options include:
• -mzarch and -mesa: Generate code exploiting the instruction set of the ESA/390 or the z/Architecture, respectively. See the GCC homepage for default values and interaction with other options.
• -m64 and -m31: Controls whether the generated code complies with the Linux for S/390 Application Binary Interface (ABI) (-m31), or to the Linux for System z ABI (-m64). See the ABI resources previously mentioned for more details about ABIs.
Debugging and Reliability
Programs don’t always work as expected. Fortunately, the GCC-based tool chain and Linux provide a rich collection of debugging tools:
• The GNU debugger GDB (visit www.gnu.org/software/gdb/) is quite powerful.
• The Data Display Debugger (see www.gnu.org/software/ddd/) is a graphical front-end for GDB and can visualize pointer-linked data structures. Several debugging tools focus on analyzing bugs related to memory references. For example, Electric fence (http://director.fsf.org/project/ElectricFence/) is a neat tool and available on many platforms, including System z. Its capability is, however, limited to dynamically allocated memory.
A powerful feature is available for Linux running under VM. VM’s TRACE command offers a convenient and powerful way to debug the whole Linux system (see z/VM CP Command and Utility Reference, SC24-5967). An introduction to debugging is available in Chapter 22 of Linux on the Mainframe (John Eilert, Maria Eisenhaendler, Dorothea Matthaeus, Ingolf Salm, Prentice Hall, 2003, ISBN 01310141532). A detailed description of debugging under Linux is provided in the file:/usr/src/linux/Documentation/s390/Debugging390.txt.
For debugging more difficult problems, information about register usage, stack frame layout, and other conventions is found in the “ELF Application Binary Interface Supplement.”
For the case that a bug is related to the translation process, GCC and its related tool provide several options for debugging. --save-temps cause all major intermediate files to be saved instead of removing them when the translation is complete. Unfortunately, heavy compiler optimization makes it difficult for the debugger to depict the correlation between the original source code and the machine. Almost all optimization passes can be activated and deactivated separately. See the GCC manual for details.
The GCC is highly reliable. To a large degree, this reflects its development process. Before a source code change is added to the official repository, the modified compiler must be able to translate itself, which is an excellent test. In addition, the new compiler must pass the GCC regression test. This is a collection of small test cases systematically covering all language constructs; it also contains source code that suffered from past compiler bugs, and everything else that compiler developers considered worthy of inclusion in the test suite. For the languages C and C++ and related libraries, the number of test cases totals nearly 100,000; this makes GCC a compiler you can rely on.