Operating Systems

GCC’s Internal Structures

The basis for GCC’s advanced code optimization features are several internal program representations. Recall from the previous article that GCC is divided into three major phases that communicate through different data structures representing the program being compiled (see Figure 2). The front-end generates a data structure based on a tree-like structure that’s close to the nature of the source code. The tree data structure resulting from the parser step is then turned into a form more appropriate to code optimization— the Static Single Assignment (SSA) form. The optimization phase and the code generator finally communicate through another representation using the so-called Register Transfer Language (RTL).


SSA Form

The SSA data structure, introduced in the late ’80s, includes an efficient representation for all aspects of a program being compiled. It’s efficient with respect to memory consumption and, more important, for execution of algorithms to perform data flow analysis. This is essential for obtaining information useful for determining if and where you should apply a certain optimization to the program.

SSA also is well-suited to actually perform optimizing program transformations. A key property for achieving this is the way multiple assignments to the same variable are represented in SSA. For more on SSA, see The Compiler Design Handbook: Optimizations and Machine Code Generation (CRC Press, Florida 2002). Not surprisingly, an SSAbased program representation was integrated into GCC a few years ago; the initial design was presented at the 2003 GCC Developers’ Summit. The proceedings are available at www.gccsummit. org/2003/.


RTL is the last intermediate representation and has existed since the beginning of GCC. Like the tree and SSA representation, RTL is platform-independent. In contrast to the other intermediate representations, however, the RTL representation is already close to real machine code. The RTL optimization and code generation layer can be considered the platform-dependent optimization step interacting heavily with the so-called back-end.

A GCC back-end consists of the machine description, which is written in a LISP-like language describing the instructions available on the machine. The remaining part of a back-end, written in C, implements hooks that leverage the RTL optimization passes. These hooks describe several machine and Application Binary Interface (ABI)-specific things you must take into account when rearranging or generating code. Examples include:

• How parameters of a certain type are passed between functions

5 Pages