A good optimization example is a technique named constant folding, which causes operations applied to constant values to be evaluated at compile time. A related technique is constant propagation. If the compiler learns that a variable has a constant value at least during certain parts of program execution, it forwards this information through the program, often enabling further applications of constant folding and constant propagation.
Be aware, however, that many optimization techniques suffer from side effects. Depending on the context, the effect of an “optimization” may turn into the opposite and make things worse.
Several optimization techniques involve trade-offs between conflicting goals. For example, a technique called common subexpression elimination prevents code from doing redundant calculations of the same expression that repeatedly occurs at several places in a program. The value needs to be saved somewhere and either occupies a register that’s no longer available for other purposes or requires an additional memory access.
You also can fine-tune application performance with loop unrolling. Imagine the program has a counting loop. Counting occurs using some variable, say i, which is incremented after every iteration and tested to determine whether you need another iteration. The loop body somehow reads the value of i. Unrolling such a loop means replacing the loop by several instances of the loop body so there’s one instance for every loop iteration. In the duplicated loop bodies, all references to variable i are replaced by the value that i would have during that particular iteration. Basically, there are three sources for possible performance improvements:
• The overhead of incrementing and testing variable i is removed.
• The register needed to hold i is now free.
• In the loop body instances, the references to a variable are replaced by constant values that may enable constant folding and constant propagation.
On the other hand, the code gets larger. Instead of just one instance of the loop body, we now have many. Depending on the machine’s cache structure and cache load situation, this increased code size may hamper performance. Longer code also may require using less efficient branch instructions. A trade-off might be partial unrolling where a certain number of iterations are retained and only some are avoided by duplicating the loop body, saving at least some iterations while limiting code enlargement. Unfortunately, this technique may suffer from more complex loop exit conditions.
The GCC manual states that loop unrolling “may or may not make code run faster.” In fact, we observed both cases: a significant decrease of run-time for some programs and an increase of run-time in other cases. The compiler provides several switches and numeric parameters for fine-tuning and applying loop unrolling.
Subprogram inlining is another optimization technique with similar trade-offs. Here, calls to a subprogram are replaced by the body of the subprogram and formal parameters are replaced by references to the actual parameters provided in the call. Primarily, the improvement is to avoid the overhead of performing the call, saving registers, and possibly passing parameter values via the stack. Also, the inserted function body may offer opportunities for specialized optimizations only possible for the respective set of parameters; in particular, if some of the parameters were constant values. On the negative side is the enlargement of the code and also possibly an increased register pressure.