CS252 Project
Simple Symmetric Multithreading in Xilinx FPGAs
Yury Markovskiy and Yatish Patel
with Nick Weaver
{yurym,
yatish,
nweaver}@cs.berkeley.edu
Slides: ppt, pdf
Summary
When implementing designs in an FPGA, there are numerous resources
available to pipeline a design to improve performance. Yet when
implementing a microprocessor or similar design, the overhead involved
in pipelining for a higher clock rate (notably due to the much
increased bypassing logic, additional instruction latency, and more
complicated control logic) becomes highly significant. We propose a
compromise between the high overhead and high performance --- Simple
Symmetric Multi-threading. Here, the register file size and TLB size
are increased, and each pipeline stage and stage in the control logic
is replaced by several stages, which are then moved to balance the
delays. However, the complexity of the bypassing is not increased, nor
are there significant changes in the control logic
complexity. Instead, separate threads are run through the identical
pipeline in round robin fashion, making the single processor core (now
running at a higher clock rate) behaves like an SMP. This
transformation was performed on a synthetic pipeline and showed very
promising results. In order to provide a concrete demonstration of
the benefits and costs of simple SMP on a real scale for complete
systems, we propose to implement a MIPS-compatible integer-only
processor core in a Xilinx FPGA using these techniques.
Progress Status
- Obtained and verified working toolflow for XILINX XC800V part, including trial license for Synplify to perform fast synthesis.
- Initial modifications to triple the number of pipeline registers, control regs, regfile and the cache are in progress. Apr 7 Meeting Notes
- Jason got us an account on BSAC computers so that we can try running ASIC toolflow with the processor. We need to find a layout models for LSI parts provided. It is possible that we must remove caches since implementing them in logic is two expansive (SRAM cells?).
References
-
Robert Alverson, David Callahan, Daniel Cummings, Brian Koblenz, Allan Porterfield, and Burton Smith. The Tera Computer System. In Proceedings of the 1990 International Conference on Supercomputing, pages 16, 1990.
-
Intel Corporation. Intel IXP 1200 Network Processor Family. http://www.intel.com/design/network/products/npfamily/ixp1200.htm.
-
Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, Rebecca L. Stamm, and Dean M. Tullsen. Simultaneous Multithreading: A Platform for Next-Generation Processors. IEEE Micro, 17(5):12-19, 1997.
-
Leiserson and Saxe. Optimizing Synchronous Circuitry by Retiming. In CTVLSI: Proceedings of the 3rd Caltech Conference on Very Large Scale Integration, 1983.
-
B. Smith. Architecture and applications of the HEP multiprocessor computer system, 1981.
-
Nicholas Weaver. The Effects of Datapath Placement and C-slow Retiming on Three Computational Benchmarks. http://www.cs.berkeley.edu/~nweaver/sfra/3benchmarks.pdf.