Assignment 0

Alexandra Meliou

Brief bio:
I am a second year graduate student. My interests lie in the area of database systems and my advisor is professor Joe Hellerstein. My reason for taking CS267 is to get familiar with parallel systems and parallel programming.

Protein Folding

Proteins, sequences of aminoacids, are fundamental components in biology work. We experience them as enzymes, participating in biochemical reactions, structural elements forming body tissues, antibodies enhancing the immune system. Proteins play an important role in understanding how basic functions in living organisms work. The sequencing of the human genome provides a blueprint for the proteins, but the knowledge of the aminoacid sequence is not enough for us to derive what the protein does and how it functions. In order to perform their functions, proteins fold, by taking particular shapes. This process of self-assembly is called folding.
The problem of protein folding is to understand how proteins assemble so fast and so reliably, which will permit the construction of synthetic polymers with the same proteins. This will also get us a step closer to understand the causes, and even find cures, for diseases such as Alzheimer, mad cow disease, forms of cancer, which are believed to result from protein misfolding.

The simulation challenge

Their ability to self-assemble is not the only amazing thing in proteins. Equally impressive is the fact that they can do so extremely fast, some in the order of a millionth of a second. Current technology requires a day to simulate a nanosecond. Proteins fold in a timescale of 10,000 nanosecond, which would mean that we need 30 CPU years to simulate on folding! In order to speed up simulation we can make simplifying assumption, like for example not consider interactions between all the atoms. Of course this results in inaccuracies in the simulation.
The project Folding@home addresses the problem of simulating protein problem, by using a distributed environment of computation. The project team has created a client application which can be downloaded and run on any personal computer. The result is a very powerful cluster from machines distributed all around the world, very similar to the environment of the SETI@HOME project. Folding@home does not require execution on a supercomputer because the algorithms that they use do not require fast networking.

Clients download working units from the F@H servers. The client application takes advantage of the CPU's idle time to run the simulation algorithms on the current working unit. When the simulation work is done, the results are send back to the assigning server. The following table demonstrates the statistics of the cluster as of today, running at ~200 TFLOPS in total.

OS TypeCurrent TFLOPSActive CPUsTotal CPUs
Windows183.604153004987022
Mac OS X7.742967868407
Linux13.54915976112198
Other0.000
Total204.8951786581167627

Since its start, Folding@home has folded several small, fast folding proteins, and the team has also performed experimental validation for their methods. Now the project is targeted in more complex protein structures, as well as some disease relevant proteins. In general the project is considered a success.
Concerning the scalability of the simulation, the project faces an obstacle inherent to the problem that it is trying to address. The problem lies within the model, which demands that every atom interacts with every other, which makes the problem difficult to divide and parallelize.

Conclusion

Protein folding is a very interesting problem which, if solved will benefit several areas of research in biology and medicine. It will offer a better understanding of how organisms work and it will provide useful insight in many diseases. As noted, the problem is very hard to simulate, a sequential approach is out of the question, and parallelism is the only feasible solution.

References: