Previous CS267 Assignment 1 results: [ 2009 | 2008 | 2007 | 2004 | 2002 | 2000 | 1999 | 1997 ]
The following plot summarizes the performance achieved by different teams. The numbers in the plot correspond to the team numbers. See the list of assigned teams to decode. "GSI'09" is a simplified version (not as optimized) of the code shown in Homework 1 notes. R GSI and A GSI are our codes (Razvan and Andrew) that we created in a weeks time when we took the class in spring 2009. "Blocked" is the simple block implementation that was supplied.
The 2 peaks that can be noticed on the right of the graph belong to 2 teams that heavily optimized their code for specific matrix sizes and that achieved high maximum speeds but that did not do so great on other matrix sizes which resulted in worse median results.
You should also note that several teams did offline performance modelling and tuning of parameters that is not included in their final submission but that was an integral part of getting the performance that is shown in the above graph. The most important of these were multiple runs with the same code to determine optimal block sizes.
The following plot shows the correlation between the number of lines in the code and its performance. The two graphs shown respresent the median performance on the top and the maximum performance achieved by these codes at the bottom. The counting was done by opening the files using notepad++ and checking the final line. Files that were not required to compile matrix multiply (e.g. benchmark.cpp) are not counted. The color code shows whether aligned intrinsics such _mm_load_pd and _mm_store_pd are used in the code. Intrinsics _mm_loadu_pd, _mm_storeu_pd and _mm_load1_pd are considered unaligned.
It should be noted that almost all codes that broke 50% of peak used some alligned intrinsics. What can be seen from this graph also is that some teams implemented intrinsics but missed other optimizations that were needed to achieve high performance (L1 blocking, local contigous copies of either A or B, padding, etc.).
If you find that performance of your code is misrepresented, please send the GSIs a note.