Mark Hoemmen, Armando Solar-Lezama, Fabrizio Bisetti, Guang Yang, Ben Schwarz, Christian Rojas
The NAS CG benchmark uses an inverse power method to find the largest eigenvalue of a random sparse matrix. It is intended to exercise the random gathers in sparse matrix-vector products as well as reduction operations. The full CG specification given at the NAS site.
Figure 1: NAS CG uses a block-cyclic layout for A.
You are to develop and tune that benchmark in Titanium and UPC, with one alteration: We want a slight variation on matrix layout of the standard NAS CG benchmark. The NAS benchmark spreads the sparse matrix across processors with a block-cyclic layout as shown in Figure 1. Four threads, T0 through T3, could be arranged into a 2-by-2 block replicated across the sparse matrix A. The vectors p and q need communicated
Figure 2: Most codes keep the sparse rows intact.
The NAS benchmark uses this arrangement to stress communication, but most real codes (like Aztec and PETSC) allocate contiguous blocks of rows to single processors as shown in Figure 2. This arrangement can require less communication and is amenable to optimization through partitioning.
Because this is more common, you need to implement the row-based scheme. The matrix is generated such that each row has about the same number of explicit entries, so you can simply spread the rows evenly across the processors. Each local piece is a matrix which can be stored in compressed sparse row format itself.
Figure 3: Diagonal blocks are "local"; computing their product could be overlapped with gathering the non-local p entries.
One interesting consequence of the row partitioning is that you can separate the matrix into a local component and an external component. You can calculate q = Alocal * plocal while gathering external entries of p, then add q += Aext * pext. This is a significant optimization in MPI-based codes, you may want to emulate it with relaxed UPC pointers.
Interesting items to time:
These codes are available to get you started. You can no doubt find others on-line.
Back to homework 3's main page.
Main CS267 page, and the TA's CS267 page
E. Jason Riedy