next up previous
Next: A formal model Up: Automatic assembly of highly Previous: Automatic assembly of highly

Introduction

Software library designers know that building fast, general purpose routines can be a difficult and time-consuming task that must often be done by hand. For example, it is well-known that the speed of dense linear algebra routines (e.g., LAPACK [ABB+92]) can be dramatically improved over ``naive'' implementations by a variety of techniques: explicit cache-blocking, loop unrolling, software pipelining, to name a few. However, selecting the best combination of these options is machine- and compiler-dependent. Furthermore, each option may be implemented in several ways, and an optimal combination may also depend on the algorithm's input.

In this report, we assume1 a more modular approach that will likely be easier to automate:

1.
Generate many different versions of an algorithm, which use a combination of optimizations that tune for general classes of inputs.

2.
Construct a single routine that, based on the algorithm's input, will select a version of the algorithm to execute at run-time via fast and simple decision rules.

The first item is currently the subject of several research projects [BACD97,WD] in the case of matrix multiply. Briefly, these projects perform automatic code generation and benchmarking to select the ``best'' combination of optimization options.

The second item solves a potential problem with the first: ``best'' can depend on the algorithm's input. In this report, that is our focus: assuming we are given a set of highly optimized versions of an algorithm, and their execution times on some sample of their possible inputs, we would like to learn decision rules for selecting an algorithm.


next up previous
Next: A formal model Up: Automatic assembly of highly Previous: Automatic assembly of highly
Richard Vuduc
1998-12-15