Operating Systems

To solve the data dependency problem, by default, GCC uses a modified version of a list scheduling algorithm. Part of the algorithm maintains a list of insns whose input values are already available and cleverly picks one of these to later add as many new insns as possible to that list.

For conflicting functional units, it’s essential for GCC to be able to quickly determine if choosing a specific insn might lead to such a conflict. So the back-end contains a detailed description of the functional units of the processor together with insn attributes indicating which instruction needs which functional unit—the pipeline description. This pipeline description is compiled by GCC into a finite state machine that can answer in linear time if scheduling an insn will lead to a conflict. Based on this, the instruction scheduler can search for an insn without conflicts.


A common way to make a processor appear faster is to enhance the instruction set with instructions covering the functionality of several others at once and leave the real work to the compiler. An obvious example is the multiply and add instruction doing the work of two instructions while using the time of just one of them. The combiner is GCC’s way of exploiting such macro instructions. To do this, the so-called “combine” step tries to merge insns and find an implementation for it in the backend. The merge step tries to merge up to three instructions at once. If the resulting insn can be found in the machine description, the combiner will take the new instruction as the base for even more merge operations. There are some instructions that provide a complex functionality. One example is the z10 instruction rotate and insert selected bits (risbg). The combiner is the only chance to exploit such instructions.

Register Allocation

Register allocation is an essential part of code generation. Until this step, GCC assumes there’s an unlimited number of registers available—called pseudo registers. The difficult task of the register allocation step is to assign a machine register to each of the pseudo registers. GCC uses an enhanced version of the sophisticated graph-coloring algorithm. If the graph-coloring fails to find such a mapping, pseudo registers will be replaced by stack slots and the register allocation will be restarted. This continues until either everything resides in memory (undesirable) or the 16 general- purpose registers of the System z CPU are sufficient to cover all the pseudo registers.

Security Features

Most technical security issues result from actual programming errors. Sloppy memory management especially opens the door to attackers trying to gain access to the machine where the code runs. As the tool knowing most about the programming language, the compiler is a natural choice when trying to add automated techniques to prevent a certain class of attacks. The following introduces a few of the mechanisms GCC provides:

Stack-smashing protector: With the stack-smashing protector, GCC provides a mechanism to protect code against a special class of stack overflow attacks. The simplest stack overflow attacks try to write data beyond the end of a stack variable to manipulate the return address of the function, which also resides on the stack. This way, an attacker can execute code with the rights of the attacked code on its own behalf. Jumping to code that opens a shell console is a technique widely used by attackers.

The GCC option -fstack-protector: This places a randomly chosen guard value on the stack between the local variables and return address. Before using the return address from the stack to return to the caller, the guard value is validated. If the guard value doesn’t match, the program is immediately terminated. Whenever an attacker writes beyond the local variable area, the guard value will be garbled on the stack and the program will exit before the attacker’s code can execute.

However, this doesn’t prevent the attacker from overwriting other variables in the local variable area. To minimize the risk of this, GCC moves potentially problematic variables such as char arrays as close as possible to the guard value. That way, the chances are high that even an overwrite not targeting the return address can be detected.

Mudflap: Mudflap is a debugging tool built into GCC. With mudflap enabled, GCC can instrument C code to track access to memory objects and its allocations (the C++ support currently is limited). So, it’s possible to detect especially annoying kinds of bugs such as invalid pointer dereferences and array out of bound accesses, which often lead to security holes. The memory objects are tracked in an external library, which is why it’s always necessary to link the mudflap library to your code.

Figure 3 shows how mudflap is able to detect a simple array out of bound access. A character array having five slots indexed from 0 through 4 is written to in a loop using indices from 0 through 5, resulting in an invalid memory access when writing to the last slot. Compiling this example and enabling mudflap, this error is detected. Mudflap prints the source code line of the invalid access together with information about the underlying memory object. Using this information, the programmer can easily locate the wrong code and fix it.

To prepare code to be built with mudflap, the -fmudflap and -lmudflap options must be specified as shown in Figure 3.


MUDFLAP_OPTIONS=-help can be used to get a list of the other analysis that can be provided. For more information about mudflap, please visit the GCC Wiki Page at http://gcc.gnu.org/ wiki/Mudflap_Pointer_Debugging or read the GCC 2003 Summit Paper “Mudflap: Pointer Use Checking for C/ C++” at: www.gccsummit.org/2003/.


GCC uses many state-of-the-art techniques to make your code run faster and harden it against attacks. Basing these techniques on machine-independent internal code representations allows GCC to provide them to a wide range of platforms and programming languages. Compiling code at the highest optimization level, GCC currently performs about 130 passes on SSA and about 60 passes on the RTL representation— and these numbers tend to grow. However, this article showed that there’s no single “one-fits-all” strategy when it comes to code optimization. Techniques might contradict each other or may not do anything helpful for a particular code piece. Code optimization is about heuristics and will continue to provide compiler developers like us with a lot of work in the future.

5 Pages