University of California Dept. of Electrical Engineering and Computer Sciences
David E. Culler TA: Jason Hill
A collection of projects were outlined at the end of Jason's Wireless Network
Sensor Lecture and some appeared in Kurt's reconfigurable lecture.
Hardening the IP stack into silicon. There have many efforts over
the years to offload substantial portions of the TCP/IP stack into co-processors,
network interfaces, etc. However, most of these projects focused
on support for general purpose machines. Now we are seeing all sorts
of small, embedded networked devices. Printers, cameras, all sorts
of things sit on the end of an ethernet connection. Given that these
function in very limited modes, it may be possible to cast the limited
slice of the stack required for these devices into hardware, or at least
cast major portions into hardware. Is this true? How do the
networking characteristics vary from general purpose PCs? What are
the opportunities for simplification? What hardware structures would
be appropriate? What do you gain?
Phil Buonnadonna has proposed a compromise between Infiniband and TCP/IP
which takes the simplified queue pairs abstraction for Infiniband and the
lower layers of the stack from TCP/IP in this simplified context.
He's done a nice implementation comparison in firmware on the Myricom
Lanai network interface. How would you implement this is hardware?
How would its complexity compare to a full blown Infiniband implementation?
Infiniband is proposed for Storage Area Networks, however, I've never seen
a concrete proposal as to how to implement storage over it. There
are concrete proposals for NASD and now some storage over IP proposals.
Propose and evaluate a strategy for providing storage access over Infiniband.
Which mode woulod you use? Connection vs datagram, reliable vs unreliable?
Many of the performance aspects of multithreaded and SMT architectures
have been analyzed and some decent power models exist for pipelined processors.
What are the energy implications of multithreading? Are there energy
optimizations? Is it possible to use the multithreading structure
to get cheap wake-up?
Many of you have probably noticed that network access to and from campus
is the pits these days. The campus limited its external bandwidth
to 70 mb/s as a cost cutting measure and now all the traffic is backed
up behind the bottleneck. Many complain that their interactive connections
are dismal in the face of all that streaming Nabster traffic. We
have all sorts of new network processor techniques - classifying packets
on the fly and scheduling various flows. These have usually been
employed to provide QoS for streams. How could you utilize network
processors to improve interactive performance?
In the coming year you will see two on-chip techniques for exploiting thread-level
parallelism: simultaneous multithreading (SMT) and chip multi-processors
(CMP). What are the trade-offs that favor one or the other?
Is there a continuum of designs between? Can you develop a quantitative
framework for identifying optimal deisgn points?
There have been numerous studies of cache design, but multithreading introduces
new behavior. How does MT impact cache design? What should
you do differently?
Several researchers have proposed reconfigurable hardware as a means of
application specific optimization and late binding of functionality.
However, there are many ways to introduce reconfigurable hardware into
a design: function units of definable function, new instructions, co-processors?
Survey the space of approaches. What are the trade-offs? Does
this lead to some new techniques for integrating reconfigurable logic into
processors and/or memories?
Virtual machine monitors have become extremely popular. Many of you
probably use VMWare. It relies on a mix of fast traps, instruction
emulation, and dynamic translation. What is the performance impact
of virtual machine monitors? Where does the time go? How does
it change cache, tlb, vm behavior?
What about many virtual machines per physical machine as is used in many
web hosting applications. (cf. ensim.com).
What are the power trade-offs implied by virtualization?
IA64 provides a whole host of performance counters. Use these to
verify various classical studies on REAL workloads.