CS252 Project
Simple Symmetric Multithreading in Xilinx FPGAs

Yury Markovskiy and Yatish Patel
with Nick Weaver
{yurym, yatish, nweaver}@cs.berkeley.edu

Project Report

Slides: ppt, pdf

Summary

When implementing designs in an FPGA, there are numerous resources available to pipeline a design to improve performance. Yet when implementing a microprocessor or similar design, the overhead involved in pipelining for a higher clock rate (notably due to the much increased bypassing logic, additional instruction latency, and more complicated control logic) becomes highly significant. We propose a compromise between the high overhead and high performance --- Simple Symmetric Multi-threading. Here, the register file size and TLB size are increased, and each pipeline stage and stage in the control logic is replaced by several stages, which are then moved to balance the delays. However, the complexity of the bypassing is not increased, nor are there significant changes in the control logic complexity. Instead, separate threads are run through the identical pipeline in round robin fashion, making the single processor core (now running at a higher clock rate) behaves like an SMP. This transformation was performed on a synthetic pipeline and showed very promising results. In order to provide a concrete demonstration of the benefits and costs of simple SMP on a real scale for complete systems, we propose to implement a MIPS-compatible integer-only processor core in a Xilinx FPGA using these techniques.

Progress Status

References

  1. Robert Alverson, David Callahan, Daniel Cummings, Brian Koblenz, Allan Porterfield, and Burton Smith. The Tera Computer System. In Proceedings of the 1990 International Conference on Supercomputing, pages 16, 1990.
  2. Intel Corporation. Intel IXP 1200 Network Processor Family. http://www.intel.com/design/network/products/npfamily/ixp1200.htm.
  3. Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, Rebecca L. Stamm, and Dean M. Tullsen. Simultaneous Multithreading: A Platform for Next-Generation Processors. IEEE Micro, 17(5):12-19, 1997.
  4. Leiserson and Saxe. Optimizing Synchronous Circuitry by Retiming. In CTVLSI: Proceedings of the 3rd Caltech Conference on Very Large Scale Integration, 1983.
  5. B. Smith. Architecture and applications of the HEP multiprocessor computer system, 1981.
  6. Nicholas Weaver. The Effects of Datapath Placement and C-slow Retiming on Three Computational Benchmarks. http://www.cs.berkeley.edu/~nweaver/sfra/3benchmarks.pdf.