next up previous
Next: Results: matrix multiply Up: Automatic assembly of highly Previous: A formal model

Application: matrix multiply

We now apply the methodology to a sample problem: dense matrix multiply. We begin with three versions of matrix multiply which were generated and automatically tuned for high performance by the PHiPAC project.2 The exact routines are described in Appendix A. Roughly, algorithm 1 (a1) has been supposedly tuned for small matrix sizes, a2 for medium sized matrices, and a3 for large matrices. The ``input space'' in this problem is the set of triples specifying matrix dimensions, $\{(M, K, N)\}$, as shown in fig. 1 (right). Thus, each sample xi is a three-dimensional vector. We measure execution times directly.

We will now specify an explicit parametric form for the boundary function f. We use the well-known ``softmax'' function:


 \begin{displaymath}f_{\theta_k}(x) = \frac{e^{\theta_k^Tx + \theta_{k,0}}}
{\sum_{j}{e^{\theta_j^Tx + \theta_{j,0}}}}
\end{displaymath} (1)

We associate one (or possibly more) parameters $\theta_k$ with each algorithm ak. The softmax function is a ``natural choice'' in this case, since it gives us soft boundaries, and has a nice probablistic interpretation.

Finally, we specify a cost function which we would like to minimize:


 \begin{displaymath}C(\theta_1,\ldots,\theta_m) =
\sum_{i=1}^{n}{\sum_{k=1}^{m}{f_{\theta_k}(x_n)T(a_k,x_n)}}
\end{displaymath} (2)

This roughly measures the ``total execution time over all input samples,'' where the times T are weighted by a contribution f from each algorithm. Our choice of f makes differentiation of (2) straightforward. We use Newton's method with line searching for the optimization.


next up previous
Next: Results: matrix multiply Up: Automatic assembly of highly Previous: A formal model
Richard Vuduc
1998-12-15