09 November 2016

Local Optimization

Local Value Numbering

  • Record distinct values

  • Subexpression elimination

  • Static single assignement

Tree-Height Balancing

  • Maximize ILP

  • Basic block as a dependency graph

    • Root if

      1. Ti is reused: |USES(Ti)| > 1

      2. Inconsistent op: |USES(Ti)| == 1 \(\wedge\) OPUSES(T\(_i\)) != OPi

      3. LiveOut(Ti)

    • Rank

      1. Constants 0

      2. Upward-exposed names 1

      3. Defined names sum of Rank(leaves)

Regional Optimization

Superlocal Value Numbering

  • Per path as a block for extended LVN analysis

  • Branch to save the prefix states for reuse in other paths

  • Scoped state tables for traversal using the worklist algorithm

Loop Unrolling

  • Reduces looping overhead

  • Increases IR and compilation time

  • Increases instruction cache usage

  • Better instruction scheduling of independent unrolled operations

  • Better locality and facilitate multi-word operations

  • Exposes cross-iteration redundancies to reuse memory references

  • Potential register spills due to increase on register usage

Global/Intraprocedural Optimization

Finding Uninitialized Variables with Live Information

  1. Build UEVar and VarKill for each basic block

  2. Iteratively compute LiveOut for each basic block

    • fixed-point solution

    • faster if evaluting in reverse order

    • \(\forall v \in\) LiveOut(\(n_0\)), \(v\) might be uninitialized

Uses for Live Variables

  • Register allocation for live variables

  • Elimination of useless store operations

Global Code Placement

Better instruction cache usage.

  • Asymmetric fall-through branch for global code placement

  • A block should be followed by its most frequent successors

  • Move infrequently executed code to the end of the procedure

Analsysis

  • Estimates of each branch’s and CFG edge' execution frequency by profiling

    • Instrumented executables: compiler generates code to count and save statistics for later use

    • Timer interrupts: build a histogram of program counter locations

    • Performance counters: Taken branches from hardware counters

  • Branch execution frequency as CFG edge weights

  • Construct hot disjoint paths of chains

    • consider edges in some order of weights

    • merge chains if possible

    • infreqent cross-chain edges

Code Placement

  • Blocks in a chain are placed in order

  • Chains are selected in priorities

Interprocedural Optimization

Inline Substitution

  • Callee size < linkage code

  • Caller size constraint

  • Dynamic call count

  • Static call count

  • Parameter count

  • Constant-valued actual parameters for folding

  • Calls in the procedure for leaves

  • Loop nesting depth

  • Fraction of execution time

Procedure Placement

  1. Build call graph G

  2. Associate edges with execution frequency as weights

  3. Order edges by weights in a priority queue

  4. Process edges from the queue by collapsing procedure nodes into the placement list

Compiler Organization

  • A compilation unit can be a procedure, a class or a source file

  • Enlarge a compilation unit

  • Perform optimization at link time on statically linked code

  • Inline functions in the same compilation unit