For two decades architecture research has been focussed on desktop or server machines. As a result of that attention, today's microprocessors are 1000 times faster than original Berkeley RISC and Stanford MIPS chips. Given the looming consolidation of desktop microprocessor architectures, it may be time to declare victory and look for new research challenges.
One candidate is personal mobile computing, where portable devices are used for visual computing and personal communications tasks. Such a device supports in an integrated fashion all the functionalities provided by a portable computer, a cellular phone, a digital camera and a video game today. In addition, we believe speech I/O will be the cornerstone of such devices.
This new challenge brings new demands for architects. This application cares much more about real-time performance than the performance target of today's out-of-order microprocessors (average case performance or SPEC performance). These programs typically operate on vectors of 8-bit or 16-bit samples of audio and visual data and 32-bit floating point data, not the 64-bit data of today's machines. In addition to high performance for multimedia and DSP functions, requirements include energy efficiency and area efficient, scalable designs.
As a starting point, we propose reviving vector architectures. Vector architectures match the narrower widths and real-time demands of multimedia. They also scale well with increasing number of transistors and wire-delay challenges of future integrated circuits. Unlike conventional DSPs, they have a foundation of compiler research which allows them to be programmed in high-level languages. And unlike the MMX-style instruction set extensions, vector arhcitectures have an elegant and fast interface to memory and scale well with vector length.
A vector machine benefits from a low-latency, high bandwidth memory. Intelligent RAM, or IRAM, merges processing and memory into a single chip to lower memory latency, increase memory bandwidth, improve energy efficiency, and reduce size. Hence IRAM appears to be an excellent technology for mobile computing. Surprisingly, the integration of the processor/cache/memory of IRAM with with high-speed serial I/O lines may also lead to very good I/O performance.
I conclude by describing the design of VIRAM-1, a microprocessor designed by graduate students that may well have more transistors than the contemporary Intel microprocessor. The goal is that in 2-3 years VIRAM-1 will consume less than 2 watts of power, contain 16-32 MBytes of memory, have about 1 GByte/sec of I/O, and crunches at the rate of 1-2 GFLOPS (64-bit floating point) and 4-16 GOPS (16-bit fixed point). It may also challenge DSP performance even though programmed in high-level programming languages.
The repercussions of success extend beyond the architecture research community. Today's semiconductor industry is sharply divided into processor and memory camps. If IRAM proves successful, unification may come to the semiconductor industry. In such a future, its unclear which company will ship the most processors.
To follow the IRAM Project , see /http://iram.cs.berkeley.edu/. A paper on this topic is availble.