CS267 Assignment 1: Optimize Matrix Multiplication

Due Date: Tuesday February 14, 2012 at 11:59PM

Problem statement

Your task is to optimize matrix multiplication (matmul) code to run fast on a single processor core of NERSC's Franklin cluster.

We consider a special case of matmul:

C := C + A*B

where A, B, and C are n x n matrices. This can be performed using 2n3 floating point operations (n3 adds, n3 multiplies), as in the following pseudocode:

  for i = 1 to n
    for j = 1 to n
      for k = 1 to n
        C(i,j) = C(i,j) + A(i,k) * B(k,j)




These parts are not graded. You should be satisfied with your square_dgemm results and write-up before beginning an optional part.

Source files

We provide two simple implementations for you to start with: a naive three-loop implementation similar to the pseudocode above, and a more cache-efficient blocked implementation.

The necessary files are in cs267_hw1.tgz. Included are the following:

A naive implementation of matrix multiply using three nested loops,
A simple blocked implementation of matrix multiply,
A wrapper for the vendor's optimized BLAS implementation of matrix multiply (default: Cray LibSci),
The driver program that measures the runtime and verifies the correctness by comparing with the vendor's implementation,
A simple makefile to build the executables,
job-blas, job-blocked, job-naive
Scripts to run the executables on Franklin compute nodes. For example, type "qsub job-blas" to benchmark the BLAS version.
The documentation for Franklin's programming environment can be found below.



You are also welcome to learn from the source code of state-of-art BLAS implementations such as GotoBLAS and ATLAS. However, you should not reuse those codes in your submission.


Below are results recorded on Franklin using the provided benchmark. Performance was reproducible to within 5%, so if you feel your performance is misrepresented, please re-run your submitted code to make sure, and then contact the GSIs (cs267.sp12@gmail.com) with this data.

Note that

[ Back to CS267 Resource Page ]