My name is Dan Adkins. This is my third year as a graduate student in computer science at Berkeley. My research interests used to be networking and peer-to-peer systems but they're rapidly shifting to unknown. I'm taking this class to finish off my breadth requirements (the others options were AI classes which I couldn't bear).
The problem being addressed is simple: chess. There are many chess-playing programs, but the best ones are fast. Their strength is almost directly correlated to how deep they can search the game tree, or in search speak "nodes-per-second."
Searching a game tree is an interesting problem to parallelize because performance of the alpha-beta algorithm and it's variants is highly dependent on the order the search tree is examined. One obvious way to parallelize the search is to search each subtree in parallel. However, keep in mind that alpha-beta spends most of its time in the first branch establishing good bounds which speed up the search of the remaining branches. So, naively searching all branches is parallel isn't much faster than sequential alpha-beta.
Cilkchess uses the Jamboree version of MTD(f), a fast, simple, minimax search algorithm. The basic idea is to search the first branch entirely (up to the cut-off depth) BEFORE searching the other branches in parallel. If the first branch yields good bounds, then the remaining branches will very efficiently be searched in parallel. They key to getting a good bound on the first branch is to select a likely good move based upon heuristicis.
The other interesting technology behind Cilkchess is the language Cilk and its runtime environment. Cilk is a multi-threaded C variant that allows nondeterministic parallel algorithms like the one Cilkchess uses to be cleanly expressed. Cilk programs use a fork-join model of parallelism and communicate via shared memory. There is nothing in a Cilk program that optimizes for any particular number of processors. Cilk programs express threads which MAY run in parallel. The Cilk runtime system efficiently assigns the threads to available processors. Programs written in Cilk can easily be ported from 4 processor machines to 1000 processor clusters.
Cilkchess was developed on 4 and 8 processor SMP's. But for the 1999 World Computer Chess Championships, Cilkchess was run on a 256 processor SGI Origin 2000. Such a machine would have been on the TOP500 list in 1999, but today it is too slow. Running on that platform, Cilkchess was typically looking 15 ply deep (almost as far as Deep Blue at the time) and performing 5-11 make-move operations per second.
The chess problem parallelizes nicely since each node of a search tree can be assigned to a processor and run independently. If the metric is just make-move operations or nodes searched, then very high performance can be acheived. But this is a fools gold. The real goal is to deeply search the tree. Pruning the tree is more effective than simply throwing massive computing resources at it. In fact, as suggested earlier, Cilkchess sacrifices parallelism when searching the first branch in order to prune the tree. It turns out that searching deep on one branch is a much better way to prune the tree than searching shallow on many branches in parallel.
So, on the face of it Cilkchess doesn't use the resources of the machine very effectively. But, if you examine most chess programs at workly, they quickly find the best move but spend a disproportionate amount of time proving that the move is indeed best. I think that this first part of the search is inherently sequential, but the proof part of the search can be parallelized. This is basically how Cilkchess approaches the problem.
Reference: Cilkchess homepage