| CS267 Assignment 3 Professor: Kathy Yelick TA: David Bindel |
Emil Ong |
Conjugate Gradient Project in Titanium
1 Titanium Sequential Baseline Code Evaluation
To understand how the Titanium code performed, the first step was to do an evaluation of the sequential baseline code. The code evaluation was done for both the preconditioned 1-D Poisson mesh as well as the 1-D preconditioned Poisson mesh on both the millennium and mcurie (Cray T3E) machines. The first metric that we can observe is the number of iterations each method took to converge to within the specified limit. The number of iterations is a consequence of the code, not of the machine the code is running on. Here are the results:

So it appears as if the number of iterations is a linear function of the number of mesh points, but the constant associated with preconditioned code is substantially less than that for the case with no preconditioning. This is misleading, however, as we shall see shortly.
The first machine that we tested the sequential code on was the millennium. Here are the results:

So, even though the preconditioned code took fewer iterations, it takes significantly longer to execute. Thus, the block Jacobi preconditioner does not seem to be worthwhile- it is actually better in the sequential case to just run the conjugate gradient code on the unconditioned matrix. However, the preconditioner should reap benefits in the parallel code, since it is embarrassingly parallel.
In addition, from the graph, it appears that:
t = Cnp where:
t = time (in ms)
n = number of mesh points
Taking the log of both sides yields:
log t = p log n + log C
So, if we plot a log-log graph of the graph above, the slope of the graph should tell us the power of p that t is proportional to. The log-log graph appears like this:

So for a sufficiently large number of mesh points (> 1000), the slope of the above graph is about 2 for both cases. Thus the running time is approximately proportional to the square of the number of mesh points.
The sequential code was also run and analyzed for mcurie. Here is the plot of timings:

as well as the log-log plot of the above graph:

Here, the slope of the graph for a sufficiently high number of points (> 1000) is about 1.5, so the time on mcurie looks to be proportional to about the number of mesh points raised to the 1.5 power. However, please note that this is just an approximation.
2 Attempt to Add Block Jacobi Preconditioner
After doing timings for all the sequential code, I attempted to implement the block Jacobi Preconditioner for the parallel code. While the project specification claims that this "should be quite easy to add", I spent many, many hours trying to figure out why it didn't work. It is true that the block preconditioner is embarrassingly parallel, but that wasn't the problem. My code would not compile and the lines it complained about seemed to be fine. The problem was probably due to my failed conversion from sequential to parallel preconditioner code, but unfortunately, I know of no debugger for Titanium, so I wasn't able to track down my problem. However, I doubt that the Titanium compiler is completely bug-free, so that may have played a part in this problem as well. In any case, this is my failed optimization, and unfortunately I have no new timings to display because of it- sorry.
3 Partners' Websites
Here are the links to Emil's and Dave's websites (both of whom worked with UPC code):
Emil: http://www.cs.berkeley.edu/~emilong/classes/cs267/cg/
Dave: http://www.cs.berkeley.edu/~strive/cs267/hw3/