next up previous
Next: Conclusions and future work Up: Automatic assembly of highly Previous: Application: matrix multiply

Results: matrix multiply

We collected data for the three matrix multiply variants discussed in the previous section. While we experimented with many forms for T, for simplicity we present data for the binary case ( T(a,x) = 0 if a is the fastest on x, 1 otherwise).

Figure 2 summarizes the results. The plots depict a 2-D cross-section of the 3-D input space. Specifically, two of the matrix dimensions were set equal (M = N). The colored points show the ``truth'' values: a1 (blue), a2 (green), and a3 (red). These were tuned approximately for small, medium, and large matrices, respectively. Note that the algorithm tuned for medium-sized matrices actually performed best on the smallest sizes (the cluster of green in the lower-left corner of the plots).

The lines on the plots indicate the decision boundaries.3 Figure 2 (left) shows the results of the optimization using 3 decision functions, one for each algorithm. The boundary lines (and arrows) indicate the regions assigned to the algorithm. As shown, the green region in the lower left was misclassified completely as blue. The misclassification rate, measured on the sample inputs, was roughly 19 percent. To be fair, however, our binary assignment of ``time'' makes our results pessimistic, since regions where algorithms appear to overlap most likely indicate that the execution times were very similar.

Figure 2 (right) shows the results when we use 6 decision functions, instead of just three. (Note also that the input samples are slightly different.) For the purpose of the optimization, we accomplish this by assigning $T(a_k,x) = T(a_{k \bmod m},x)$, where m is the true number of algorithms, and k is between 1 and 6 inclusive. As shown, this resulted in a much better classification. In fact, this reduced the misclassification rate to 12 percent.

In general, we will be able to capture more complicated or isolated regions by using more decision functions. However, the trade-off is that we will have to spend more time making those decisions at run-time, potentially sacrificing performance on small matrices.


  
Figure 2: ( Left) Fits based on 400 input samples and 3 boundaries. The region below the blue line was ``assigned'' to a1, the middle region to a2, and the region above the red line to a3. The misclassification rate is 19 percent. ( Right) Fits based on 6 boundaries. This enabled a better fit to both the green and blue regions. The misclassification rate was in turn reduced to 12 percent.
\includegraphics[width=2.5in,angle=90]{figs/400.eps} \includegraphics[width=2.5in,angle=90]{figs/400_6.eps}


next up previous
Next: Conclusions and future work Up: Automatic assembly of highly Previous: Application: matrix multiply
Richard Vuduc
1998-12-15