Next: A formal model
Up: Automatic assembly of highly
Previous: Automatic assembly of highly
Software library designers know that building fast, general purpose
routines can be a difficult and time-consuming task that must often
be done by hand. For example, it is well-known that the speed of dense
linear algebra routines (e.g., LAPACK [ABB+92]) can be dramatically
improved over ``naive'' implementations by a variety of techniques:
explicit cache-blocking, loop unrolling, software pipelining, to name a
few. However, selecting the best combination of these options is machine-
and compiler-dependent. Furthermore, each option may be implemented in
several ways, and an optimal combination may also depend on the
algorithm's input.
In this report, we assume1 a more
modular approach that will likely be easier to automate:
- 1.
- Generate many different versions of an algorithm, which use a
combination of optimizations that tune for general classes
of inputs.
- 2.
- Construct a single routine that, based on the algorithm's
input, will select a version of the algorithm to execute at
run-time via fast and simple decision rules.
The first item is currently the subject of several research projects
[BACD97,WD] in the case of matrix multiply. Briefly,
these projects perform automatic code generation and benchmarking to
select the ``best'' combination of optimization options.
The second item solves a potential problem with the first: ``best''
can depend on the algorithm's input. In this report, that is
our focus: assuming we are given a set of highly optimized versions of
an algorithm, and their execution times on some sample of their possible
inputs, we would like to learn decision rules for selecting an algorithm.
Next: A formal model
Up: Automatic assembly of highly
Previous: Automatic assembly of highly
Richard Vuduc
1998-12-15