| CS267 Homework 0. Guillermo Diez-Canas. Email. |
| Bio |
I studied telecommunications engineering (a degree that is equivalent to EECS in France, Germany and Spain) in Madrid, Spain. During my third year of undergrad i worked at REM Infografica (in Spain), where i developed a computer graphics modeling package that surprisingly remained as the industry standard for implicit modeling for over 5 years. After completing the engineering degree i started the PhD program at Berkeley under a Fulbright and "la Caixa" fellowship. I work in surface remeshing, surface simplification and implicit modeling in the computer graphics group. I have used template meta-programming, a bit of expression templates and some memory optimization (clustering, tiling, for spatial data-structures). I'm looking forward to learning about parallel programming - computer graphics is insatiable in terms of performance.
| Kilauea parallel global illumination renderer |
Kilauea was an effort by Square usa to produce a scalable, parallel global illumination renderer
for use in their production environment (after Final Fantasy). It is based on a cluster of
SMP nodes with conventional Intel processors (mainly pentium III), connected with 100Mb Ethernet.
Pthreads were used within the nodes and MPI for communication between nodes. They report MPI
as not being thread-safe and buggy, so they ended up implementing a subset of MPI themselves
(as well as a thread-safe memory allocator).
The problem to be solved - global illumination using photon mapping - requires a raytracing and a
final gather stages. Both stages are inherently parallel with a small granularity. The first stage
takes as input a scene and produces a photon map, the second takes both the scene and photon map
and produces an image.
For
small enough scenes that fit in memory at each node, they distribute a copy of
the scene to each node and the speeup obtained is nearly linear. For larger
scenes, the geometry is split among nodes randomly, instead of a hierarchical
partition of space, they send geometry primitives in random order to nodes:

Entire copies of the scene are stored at groups of nodes, each group is capable of tracing one ray at a time since it needs the entire scene to ensure it finds the answer. Therefore the amount of parallelism is reduced to the number of times that the scene fits in the combined memory - if the problem only fits once in combined memory there is no gain in performance with respect to a uniprocessor implementation. Final gathering is implemented similarly. Every time that a ray-cast query arrives, it is assigned a group of nodes holding a copy of the scene, and the ray query is forwarded to all nodes is the group at the same time, their results are combined to determine the closest intersection point (which needs the results from all nodes in the group to be determined):
As benefits of their node grouping technique they report: consistent and predictable latency and good
load balancing. In practice, scenes can be split among nodes in a coherent way
(all geometry in a node is nearby), with ray queries forwarded between nodes
(ray query data is extremely small), reducing greatly the number of ray tracing
queries executed at each machine.
Performance is nearly linear with the number of groups:
They report network use as being the bottleneck, and propose using faster Gigabit Ethernet or implementing
data compression to reduce the communication overhead.
The system is simple, and therefore easier to implement and debug, but is greatly limited by the size of the input.
By limiting by design the amount of paralellism to the number of complete scenes copies that the combined
memory can hold, much of the benefits of using a large number of processors is lost for large scenes (which are far from being
rare in production environments). Perhaps better schemes for dividing the scene geometry among nodes and using deferred
shading to maximize the number of outstanding ray-cast queries in the queue could increase the level of
parallelism for large scenes that don't fit at every node.
Some sample images are:

Reference: "Kilauea": parallel global illumination renderer