Next: Results: matrix multiply
Up: Automatic assembly of highly
Previous: A formal model
We now apply the methodology to a sample problem: dense matrix multiply.
We begin with three versions of matrix multiply which were generated and
automatically tuned for high performance by the PHiPAC
project.2 The exact routines are described in
Appendix A. Roughly, algorithm 1 (a1) has
been supposedly tuned for small matrix sizes, a2 for medium sized
matrices, and a3 for large matrices.
The ``input space'' in this problem is the set of triples specifying
matrix dimensions,
,
as shown in fig. 1
(right). Thus, each sample xi is a three-dimensional vector. We
measure execution times directly.
We will now specify an explicit parametric form for the boundary function
f. We use the well-known ``softmax'' function:
 |
(1) |
We associate one (or possibly more) parameters
with each algorithm
ak. The softmax function is a ``natural choice'' in this case, since
it gives us soft boundaries, and has a nice probablistic interpretation.
Finally, we specify a cost function which we would like to minimize:
 |
(2) |
This roughly measures the ``total execution time over all
input samples,'' where the times T are weighted by a contribution f from
each algorithm. Our choice of f makes differentiation of (2)
straightforward. We use Newton's method with line searching for the
optimization.
Next: Results: matrix multiply
Up: Automatic assembly of highly
Previous: A formal model
Richard Vuduc
1998-12-15