Dan Bonachea's CS 265 Expert Topic Page
Array Access Loop Optimizations
At a recent Titanium project meeting, a need was expressed for
adding optimizations in the Titanium compiler to streamline accesses to Java arrays within
loops (currently, Titanium array accesses within foreach loops are highly optimized, but
Java array accesses and all accesses within Java for loops apparently are not). I was
encouraged by the group to pursue the addition of these optimizations as my 265 semester
project to help alleviate bottlenecks which have been demonstrated in legacy Java
benchmarks compiled under Titanium. This seems like a worthwhile and reasonable goal for
my semester research project, and I've chosen my "expert topic" to provide
support for this project.
Therefore, Id like my topic to deal with the optimization of array accesses
within loops. If this topic seems overly broad, then I can confine my study to one or more
of the typical constituent optimizations such as invariant code motion, strength reduction
or induction variable elimination.
I've concluded my investigation of my expert topic. My lecture on 4/6
covered various topics in loop optimization (invariant and induction variable analysis,
reassociation, strength reduction, linear test replacement) and bounds checking
elimination. The lecture focused on the material in ARZ Ch. 9 (for the loop
optimizations) and the Gupta 93 paper (for
the bounds checking elimination).
Here are the lecture notes (they were drawn from
- Main notes - These are my high-level, narrative slides
- Motivating Example - from (Allen, Cocke, and Kennedy 81)
- All other examples I showed came directly from ARZ Ch. 9 and the Gupta 93 paper
Loop Optimization Resources (in rough order of
- Peter Markstein, Victoria Markstein and Kenneth Zadeck. "Strength
Reduction", Ch. 9 from Optimization in Compilers, edited by Allen, Rosen and
Zadeck (unpublished textbook) 1992.
- This is the basis for the loop optimization algorithms I presented in class,
and uses the same motivating example. After hearing my lecture you should be able to read
it pretty smoothly and understand what's going on. This seems to be the only source that
uses SSA to perform all the loop optimizations, and they do so quite naturally and
successfully - the algorithms they present are simpler and more powerful than the
conventional (non-SSA) approaches described elsewhere in the literature. The text also
goes into more depth on some of the architecture-specific optimizations that we glossed
- Michael Gerlek, Eric Stoltz, Michael Wolfe, "Beyond Induction Variables:
Detecting and Classifying Sequences Using a Demand-Driven SSA Form", ACM Transactions
on Programming Languages and Systems, January 1995.
Linear induction variable detection is usually associated with the strength reduction
optimization. For restructuring compilers, effective data dependence analysis requires
that the compiler detect and accurately describe linear and nonlinear induction variables
as well as more general sequences. In this article we present a practical technique for
detecting a broader class of linear induction variables than is usually recognized, as
well as several other sequence forms, including periodic, polynomial, geometric,
monotonic, and wrap-around variables. Our method is based on Factored Use-Def (FUD)
chains, a demand-driven representation of the popular Static Single Assignment (SSA) form.
In this form, strongly connected components of the associated SSA graph correspond to
sequences in the source program: we describe a simple yet efficient algorithm for
detecting and classifying these sequences. We have implemented this algorithm in Nascent,
our restructuring Fortran 90+ compiler, and we present some results showing the
effectiveness of our approach.
- This paper presents a more complicated approach that classifies induction
variables into a richer taxonomy in an attempt to discover and exploit more complicated IV
relationships and enable more powerful strength reduction transformations.
Unfortunately, they find that most of the induction variables that arise in
practice are linear.
- F. E. Allen, John Cocke, and Ken Kennedy, "Reduction of Operator
Strength", Ch. 3 from Program Flow Analysis, edited by Steven Muchnick and
Neil Jones, 1981.
- A frequently cited source for conventional strength reduction. My motivating
example in Fortran was based on an example in this chapter.
- Steven Muchnick, "Loop Optimizations", Ch. 14 from Advanced Compiler
Design & Implementation, 1997.
- Another conventional approach to strength reduction. 35 pages of the same kind
of confusing crap we've all come to expect from this book.
Bounds Checking Elimination Papers (in rough
order of importance):
- Rajiv Gupta, "Optimizing Array Bound Checks
Using Flow Analysis", ACM Letters on Programming Languages and Systems,
Vol.2, Nos.1-4, March-December 1993, Pages 135-150.
This paper describes techniques for optimizing range checks performed to detect array
bound violations. In addition to the elimination of range check:s, the optimizations
discussed in this paper also reduce the overhead due to range checks that cannot be
eliminated by compile-time analysis. The optimizations reduce the program execution time
and the object code size through elimination of redundant checks, propagation of checks
out of loops, and combination of multiple checks into a single check. A minimal control
flow graph (MCFG) is constructed using which the minimal amount of data flow information
required for range check optimizations is computed. The range check optimizations are
performed using the MCFG rather the CFG for the entire program. This allows the global
range check optimizations to be performed efficiently since the MCFG is significantly
smaller than the CFG. Any array bound violation that is detected by a program with all
range checks included, will also be detel:ted by the program after range check
optimization and vice versa. Even though the above optimizations may appear to be similar
to traditional code optimizations, similar reduction in the number of range checks
executed can not be achieved by a traditional code optimizer. Experimental results
indicate that the number of range checks performed in executing a program is greatly
reduced using the above techniques.
- This paper is an updated version of the 1990 paper below which also provides a
gentle introduction to the topic, and generalizes some of the techniques in the previous
paper to be more effective. The paper unfortunately contains a few poorly-placed
typos that obscure the presentation somewhat, but it's still a great paper. This paper was
the basis for the bounds checking elimination optimizations presented in class.
- Rajiv Gupta, "A Fresh Look at Optimizing
Array Bound Checking", Proceedings of the ACM SIGPLANSO Conference
on Programming Language Design and Implementation, White Plains, New York, June 20-22,
Bound checks are introduced in programs for the run-time detection of array bound
violations. Compile-time optimizations are employed to reduce the execution-time overhead
due to bound checks. The optimizations reduce the program execution time through
elimination of checks and propagation of checks out of loops. An execution of the
optimized program terminates with an array bound violation if and only if the same outcome
would have resulted during the execution of the program containing all array bound checks.
However, the point at which the array bound violation occurs may not be the same.
Experimental results indicate that the number of bound checks performed during the
execution of a program is greatly reduced using these techniques.
- This is the earlier of two papers by the same author that provide a good survey
of the techniques involved in array bounds checking optimization (propagation and
elision). This paper is a good introduction to the topic, and provides an extensive
overview of the existing methods. The paper also justifies the need for optimization
techniques specialized to array bound-checking by empirically comparing the effectiveness
of the specialized optimizations to the improvements provided by traditional code
- Priyadarshan Kolte and Michael Wolfe, "Elimination
of Redundant Array Subscript Range Checks", SIGPLAN 95 LaJolla, CA
This paper presents a compiler optimization algorithm to reduce the run time overhead
of array subscript range checks in programs without compromising safety. The algorithm is
based on partial redundancy elimination and it incorporates previously developed
algorithms for range check optimization. We implemented the algorithm in our research
compiler, Nascent, and conducted experiments on a suite of 10 benchmark programs to obtain
four results: (1) the execution overhead of naive range checking is high enough to merit
optimization, (2) there are substantial differences between various optimizations, (3)
loop-based optimizations that hoist checks out of loops are effective in eliminating about
98% of the range checks, and (4) more sophisticated analysis and optimization
algorithms produce very marginal benefits.
- This paper builds on Gupta's work. They use SSA and partial redundancy
elimination to drive their analysis, although the approach seems somewhat forced. Their
algorithm turns out to be significantly slower at compile-time, but the runtime gains are
not very noticeable. They do a weak form of value analysis to eliminate check
redundancies where the variables involved are known to be related in a simple way - this
seems to rarely help much, but has a significant compile time cost. On the upside, they do
present a more rigorous empirical investigation into the results of bounds hecking
elimination in a production compiler (Gupta's measurements are all "by hand").
- Victoria Markstein, John Cocke, and Peter Markstein "Optimization of Range
Checking", SIGPLAN '82 Symposium on Compiler Construction, June 1982.
- This is a frequently-referenced paper on this topic, one of the earlier looks
at bounds checking optimization. Realizes the importance of identifying bound-check
traps in the IR. Only deals with bounds checking on the upper bound - pushes trap checks
into the loop header AND the loop exit, a practice later frowned upon in the literature
because some traps happen after the violation, and this is unacceptable for
safety-critical applications. Incidentally, two of the authors are those who did ARZ Ch.
- Couldn't find this one electronically - it's in the library or I'd be happy to
provide a photocopy on request
- William H. Harrison, "Compiler Analysis of the Value Ranges for
Variables", IEEE Transactions on Software Engineering, May 1977.
- This paper seems to be the first treatment of formal methods for test elision
and hoisting using compile-time value analysis. It provides a good investigation into
practical value range analysis using an interative method that propagates knowledge from
assignments, manifest constants and conditional branches to approximate the range of
values possible for each variable in a program at every program point. It calculates these
ranges using 2 separate analyses, range propagation and range analysis, and intersects
their results to refine the analysis further. This information is then used to
perform test elision, test hoisting, and dead code elimination.
- Couldn't find this one electronically - it's in the library or I'd be happy to
provide a photocopy on request
- David Callahan, Steve Carr, and Ken Kennedy, "Improving Register Allocation for Subscripted Variables",
Proceedings of the ACM SIGPLANSO Conference on Programming Language Design and
Implementation. White Plains, New York, June 20-22, 1990.
Most conventional compilers fail to allocate array elements to registers because
standard data-flow analysis treats arrays like scalars, making it impossible to analyze
the definitions and uses of individual array elements. This deficiency is particularly
troublesome for floating-point registers, which are most often used as temporary
repositories for subscripted variables. In this paper, we present a source-to-source
transformation, called scalar replacement, that finds opportunities for reuse of
subscripted variables and replaces the references involved by references to temporary
scalar variables. The objective is to increase the likelihood that these elements will be
assigned to registers by the coloring-based register allocators found in most compilers.
In addition, we present transformations to improve the overall effectiveness of scalar
replacement and show how these transformations can be applied in a variety of loop nest
types. Finally, we present experimental results showing that these techniques are
extremely effective-capable of achieving integer factor speedups over code generated by
good optimizing compilers of conventional design.
- This paper presents an interesting idea of a source-to-source preprocessor that
transforms array references to temporary scalar variables wherever possible, concentrating
their effort on loops. This pre-processing results in better final code quality after
running through traditional optimizers, which usually fail to fully optimize subscripted
variable accesses, especially in the presence of loop-carried dependencies. The
applicability of the research seems somewhat limited but the speedup seems to be
substantial when it works. It seems this type of optimization would best be done as
an integrated part of a modern optimizer where the necessary data-dependence analyses are
already available, in the interests of compilation efficiency.
- Yunheung Paek, Jay Hoeflinger, and David Padua, "Simplification of Array Access Patterns for Compiler Optimizations",
SIGPIAN 96 Montreal, Canada.
Existing array region representation techniques are sensitive to the complexity of
array subscripts. In general, these techniques are very accurate and efficient for simple
subscript expressions, but lose accuracy or require potentially expensive algorithms for
complex subscripts. We found that in scientific applications, many access patterns are
simple even when the subscript expressions are complex. In this work, we present a new,
general array access representation and define operations for it. This allows us to
aggregate and simplify the representation enough that precise region operations may be
applied to enable compiler optimizations. Our experiments show that these techniques hold
promise for speeding up applications.
- Evelyn Duesterwald, Rajiv Gupta and Mary Lou Soffa, "A Practical Data Flow Framework for Array Reference Analysis
and its Use in Optimizations", ACM SlGPLAN PLDl 6/93 Albuquerque, N.M.
Data flow analysis tecluiques have traditionally been restricted to the analysis of
scalar variables. Thk restriction, however, imposes a limitation on the kinds of
optirnizations that Cm be performed in loops containing array references. We present a
data flow framework for array reference analysis that provides the information needed in
various optimization targeted at sequential or fine-grained parallel architectures. The
framework extends the traditional scalar framework by incorporating iteration distance
values into the analysis to qualify the computed data flow solution during the fixed point
iteration. Analyses phrased in this framework are capable of discovering recurrent access
patterns among array references that evolve during the execut;on of a loop. The framework
is practicat in that the fixed point solution requires at most three passes over the body
of structured loops. Applications of our framework are discussed for register allocation,
load/store optimizations, and con~olled loop unrolling.
- Hongwei Xi and Frank Pfenning, "Eliminating
Array Bound Checking Through Dependent Types", SIGPLAN 98
We present a type-based approach to eliminating array bound checking and list tag
checking by conservatively extending Standard ML with a restricted form of dependent
types. This enables the programmer to capture more invariants through types while
type-checking remains decidable in theory and can still be performed efficiently in
practice. We illustrate our approach through concrete examples and present the result of
our preliminary experiments which support support the feasibility and effectiveness of our
- Todd M. Austin, Scott E. Breach and Gurindar S. Sohi. "Efficient Detection of All Pointer and Array Access Errors",
SIGPLAN 94 - 6/84 Orlando, Florida USA.
We present a pointer and array access checking technique that provides complete error
coverage through a simple set of program transformations. Our technique, based on an
extended safe pointer representation, has a number of novel aspects, Foremost, it is the
first technique that detects all spatial and temporal access errors. Its use is not
limited by the expressiveness of the language; that is, it can be applied successfully to
compiled or interpreted languages with subscripted and mutable pointers, local references,
and explicit and typeless dynamic storage management, e.g., C. Because it is a source
level transformation, it is amenable to both compile- and run-time optimization. Finally,
its performance, even without compile-time optimization, is quite good. We implemented a
prototype translator for the C language and analyzed the checking overheads of six
non-trivial, pointer intensive programs. Execution overheads range from 130% to 540%; with
text and data size overheads typically below 100~
- E. Christopher Lewis, Calvin Lint, and Lawrence Snyder, "The Implementation and Evaluation of Fusion and Contraction in Array
Languages", SIGPLAN 99 Montreal, Canada.
Array languages such as Fortran 90, HPF and ZPL have many benefits in simplifying
array-based computations and expressing data parallelism. However, they can suffer large
performance penalties because they introduce intermediate arrays-both at the source level
and during the compilation process-which increase memory usage and pollute the cache. Most
compilers address this problem by simply scalarizing the array language and relying on a
scalar language compiler to perform loop fusion and array contraction. We instead show
that there are advantages to performing a form of loop fusion and array contraction at the
array level. This paper describes this approach and explains its advantages. Experimental
results show that our scheme typically yields runtime improvements of greater than 20% and
sometimes up to 400%. In addition, it yields superior memory use when compared against
commercial compilers and exhibits comparable memory use when compared with scalar
languages. We also explore the interaction between these transformations and communication
Back to Dan's Home Page