TITAN: A Next-Generation Infrastructure for Integrating Computing and Communication

Grant Number: CDA-9401156

Computer Science Division
University of California, Berkeley

Technical Progress Report (08/01/98 - 07/31/99)


Contributions

The Titan/NOW project is demonstrating a fundamental change in approach to the design and construction of large-scale
computing systems. This was motivated by the desire to deploy powerful systems very rapidly and to scale them incrementally, as is required to fully
utilize commercial technologies that are advancing at a high rate, to meet new service demands that are increasing 'on internet time', to address
emergency situations, and to reach beyond current limits in computational science.

The key enabling technology for the project was the emergence of scalable, low-latency, high-bandwidth VLSI switches, pioneered in massively parallel processors and transferred into system area network (SAN) configurations. With SAN technology, it became
feasible to construct powerful, integrated systems by literally plugging together many state-of-the-art commercial workstations or PCs to form a
high-performance cluster. However, to realize the potential of SAN-based cluster design, critical limitations of existing systems had to be overcome.
Particularly, the costly communication stack forming the software interface between the user application and the network had to be essentially
eliminated, while preserving the ability to share communication resources effectively. In addition, the operating system functionality associated with an
individual node had to be made available cluster-wide. Overcoming these challenges enabled many novel design concepts, such as file and virtual
memory systems that utilize remote memories in preference over disks, schedulers that coordinate actions implicitly, massively scalable I/O, and
scalable active services.

The project demonstrated the design approach, the solution to core challenges, and novel design opportunities by building and utilizing a cluster of over one hundred Ultrasparc workstations interconnected by a multigigabyte Myricom network. This prototype cluster enabled
many further investigations; it was basis for Inktomi (now the world's largest search engine), set the world records for throughput and response time on
the disk-to-disk database sort benchmark and held them for two years, broke the RSA key challenge, put clusters on the Linpack Top-500 list, hosted
the transcodng proxy service (BARWAN Transcend) and the media gateway service (MASH), provided the simulation engine for
processor-in-memory (IRAM) and configurable architectures (BRASS), enabled real-time video effects processing and allowed massive image
processing in digital libraries. Technology from the project has transferred to industry in many forms, including the recent Intel/Microsoft/Compaq
Virtual Interface Architecture (VIA).

The first major application breakthrough on Titan/NOW was Inktomi, which debuted on a segment of the NOW cluster in summer 1995 as the first search
engine to offer fast response time and a large search set. It utilized the capacity and parallel transfer of cluster I/O to support a very large database,
while using the fast communication layer to fully parallelize each query for fast response. The approach and the underlying communication technology
transferred to the company, which powers many of the world's leading search sites, including HotBot, NBC's Snap!, Yahoo! and the Disney Internet
Guide (DIG) for children and families. Additional advances came through the development of active infrastructure services, in collaboration with the
MASH and BARWAN projects, which place services into the infrastructure that adapt content and access to the needs of a large number of potentially
limited clients. For example, the Transcend proxy provides not only caching, but actively renders pages and reprocesses images into a form where they
can be transferred quickly over a low-speed link and presented on a small Palm Pilot. The Media Gateway participates in an mbone session while
transcoding the video stream to match the limited connection of its client. Each of these reside permanently in a portion of NOW, but grow out to utilize
cluster resources as demand increases. In addition, NOW has been made available to the national computation science and computer science
community as an experimental resource with the National Partnership for Advanced Computing Infrastructure (NPACI).


Activities

Physical Infrastructure

The physical infrastructure of Titan  consists of 100 Ultra 170 workstations connected by a high speed Myrinet, 35 Intel PentiumPro PCs and roughly 400 IBM disks used in  a massive storage cluster, four 8-way Sun Enterprise 5000 SMPs, and the mediastation component consistng of roughly 150 HP/715s and 100 donated Intel Pentium Pro PC and monitors from Sony and Samsung. This year we replaced  the entire Myricom network within NOW to overcome hardware problems with the switches and to enhance the storage capacity of the network interface card. The hardware problems have not prevented us from moving forward with the research, in fact they have driven it in interesting ways. NOW is a substantially larger cluster than what the vendor can test and, with our fast communication layers, we drive the network harder than the vendor layers can. As a result, we have revealed a serious of hardware problems with the switches. This caused up to incorporate fast error detection and retry with the Active Message layer. The upgrade allows us to move forward to better switch hardware and a more useful configuration.

A rich set of demanding applications are now heavily using the infrastructure and stressing the system as forecasted in the proposal.  For example, the vision group has recently produced techniques for photorealistic rendering.  They routinely run on as many as a hundred nodes, because they need fast turn-around around in the rendering process to determine how to adjust scene, lighting, etc. to reveal the impact of their techniques.  They also routinely consume thousands of node hours to produce movies once these decisions are made.  There is a great deal of machinery involved in the state management for this sort of application. This problem, and that of integrating the two modes of use with interactive parallel scientific use, and the systems experiments, presents a family of challenges of the kind that we hoped would arise from the proposed cycle of innovation. Several important technological components have been developed within the project, such as self-scheduled parallel IO system, called River, and the ability to run IP traffic within a virtual network over Myrinet active messages.

The multimedia component of the project has been strongly influenced by advances in multicast communication, which is still in transition and needs to be explored further to fully complete the proposed work. One of the exciting driving applications is real-time software video effects. Often multicast is used for delivering multimedia content, and with the arrival of Steven McCanne on the faculty mid way through the Titan project, this became an important aspect of the multimedia work. An unexpected result was that extensions of the multicast session management techniques would be instrumental in managing parallel, real time video effects processing within the TITAN/NOW infrastructure. However, multicast is still in transition as the commercial world shifts from DVMRP to PIM and as new protocols and transports are developed internally. We are continuing to work closely with NorTel networks in upgrading the IP network in Titan to better support multicast for the multimedia component driving the core Titan infrastructure.

We have replaced the ATM cloud and 10 Mb/s external network of NOW with a switch gigabit core and 100 Mb/s links to the NOW nodes.


Core: Architecture and Operating Systems (D. Culler, D. Patterson)

Communication

We continued to advance the performance and reliability of our fast general purpose communication layer over scalable low-latency networks, Active Messages II.  This includes underlying driver support for virtual networks, which has been fully stress tested with large numbers of simultaneous parallel applications. We have done a great deal of study and optimization of the protocols.  We have been able to gain the full performance of dedicated communication layers that provide no capability of multiprogramming parallel programs or dealing with errors or reconfiguration of the network.  We have developed extensive performance analysis tools and have been able to demonstrate full communication performance on individual parallel programs and graceful sharing of communication resources under heavy load.  In our stress tests, roughly 80% of stand-alone performance is delivered while the physical communication resources are overcommitted by an order of magnitude [MaCu99].

[MaCu99] Design Challenges of Virtual Networks: Fast, General-Purpose Communication.
     SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP) , Atlanta, Georgia, May 4-6, 1999.
     Alan M. Mainwaring and David E. Culler.

We have conducted an extensive performance evaluation study of the sensitivity to commmunication performance of a range of workloads  on NOW by building an in situ measurement facility for conventional distributed applications and by integrating the measurement capability into our communication layers.  Extending the tools we developed to isolate the performance factors, including instruction breakdowns, scaling of computational work, MPI send, receive, and wait time, and cache traces for parallel applications, we conducted extensive studies of the sensitivity of traditional distributed system workloads to network latency, overhead and bandwidth.  While the sensitivity is quite different from parallel programs, overhead remains the dominant factor [MaCu99, Mar99].
 

[MaCu99] NFS Sensitivity to High Performance Networks,
     SIGMETRICS'99, Atlanta, May 1999
     Richard Martin and David Culler
 
[Mar99] Application Sensitivity to Network Performance
     PhD Thesis, U.C. Berkeley, 1999
     Richard P. Martin

We have studied in detail how to minimize the latency of a message through a network that consists of a number of store-and-forward stages, especially for the page size chunks transported within cluster files systems. This research is especially important for today's low overhead communication subsystems that employ dedicated processing elements for protocol processing. We have developed an abstract pipeline model that reveals a crucial performance tradeoff. We exploit this tradeoff in fragmentation algorithms designed to minimize message latency. By applying a rather formal methodology to the Myrinet-GAM system, we have improved its latency by up to 51% [Wan*98]

[Wan98]    Modeling and Optimizing Pipeline Latency.
     1998 SIGMETRICS Conference on the Measurement and Modeling of Computer Systems , Madison, Wisconsin , 6/24/98 - 6/26/98 .
     Randolph Y. Wang, Arvind Krishnamurthy, Richard P. Martin, Thomas E. Anderson, David E. Culler.
 

Global Operating System Layer


We demonstrated a global layer operating system, Glunix, that scales beyond 100 workstations connected by low latency custom network interfaces
via a high bandwidth switch. It  provides a parallel network process abstraction and it able to run both parallel programs and unmodified sequential
programs. This was a cornerstone in all research conducted on NOW and has executed many million jobs.  A heartbeat monitor and automatic restart
facility, GluGuard, was developed to maintain the collection of Glunix components across failures and incremental reconfigurations.  The development
of Glunix  revealed several fundamental limitations in the classic Unix interface that compromise perfect virtualization.  We developed a sophisticated
interpositioning stategy to overcome these difficulties.  Detail is provided in the [Gho98, Gho*98]

[Gho98]     User Level Operating System Services,
     PhD Thesis, U.C. Berkeley, 1998
     Douglas P. Ghormley

 [Gho*98]    GLUnix: A Global Layer Unix for a Network of Workstations.
     Software-Practice and Experience, Vol. 28, No. 9, July 25. 1998 .
     Douglas P. Ghormley, David Petrou, Steven H. Rodrigues, Amin M. Vahdat, Thomas E. Anderson.

One of the critical issues revealed in our technology evaluation phase was coordinated scheduling of parallel programs.  Most programs are written with
the assumption that the consituent processes actually run at the same time.  Local operating system schedulers do not obey this discipline, and the lack
of coscheduling can result in slowdowns of one or even two orders of magnitude relative to dedicated user. Although Glunix provided an explicit gang
scheduling capability, that solution was unsatifactory for several reasons.  We wanted to be able to mix sequential and parallel jobs and allow
interactive use.  Gang scheduling is inefficient in presence of load imbalance or when consituent processes are operating independently.  Its
implementation is complex and introduces another set of potential failures and performance bottlenecks.  The deep question was whether it was
possible to design mechanisms where parallel programs could get themselves coscheduled over local schedulers when they require it - using only the
communication inherent to the program.  We developed an elegant and simple adpative technique, where the communication runtime observes how
long it waits for responses and reacts by either continuing to wait or by blocking.  A complete development and evaluation of the effectiveness of this
approach over a wide range of applications and scenarios is provided by [Arp*98, Arp98]
 
[Arp*98]     Scheduling with Implicit Information in Distributed Systems.
     1998 SIGMETRICS Conference on the Measurement and Modeling of Computer Systems , pages 233-243 , Madison , Wisconsin , June
     24-26, 1998 .
     Andrea C. Arpaci-Dusseau, David E. Culler, Alan Mainwaring.
 
[Arp98]     Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems
     Ph.D. Dissertation, University of California, Berkeley, December 1998.
     Andrea C. Arpaci-Dusseau

An fundamental strength of cluster architectures proved to be the ability to drive massive I/O bandwidth across a large number of independently
attached disks.  To explore this capability, we built a very high-performance parallel I/O facility and used it to set (and hold for two years) the world
record disk-to-disk sorting benchmark (both the response oriented Datamation benchmark and the bandwidth oriented Minute Sort).  Detail is in the
following.
 
[Arp*98b]   Searching for the Sorting Record: Experiences in Tuning NOW-Sort.
    The 1998 Symposium on Parallel and Distributed Tools (SPDT '98) , Welches, Oregon , August 3-4, 1998 .
    Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, David E. Culler, Joseph M. Hellerstein, David A. Patterson.

This investigation revealed that disks are a fundamental source of performance variation, due to mechanical characteristics, variations is transfer rates,
and numerous other factors, which has largely been overlooked.  Thus, we investigated the fundamental question of designing parallel I/O systems with
robust performance and developed a novel, adaptive scheme where applications are constructed as a composition of flows and computation is
scheduled based on availability of data.  This dataflow scheme generalizes techniques employed within DBMS systems.  Replication provides a means
of obtaining 'performance availability', not just functional availability.  Detail is provided in the following.
 
[Arp*99]  Cluster I/O with River: Making the Fast Case Common.
 IOPADS '99.
Remzi H. Arpaci-Dusseau, Eric Anderson, Noah Treuhaft,
David E. Culler, Joseph M. Hellerstein, David A. Patterson, Katherine Yelick.

[Kee*98] A Case for Intelligent Disks (IDISKs). Kimberly K. Keeton, David A. Patterson and Joe Hellerstein. SIGMOD Record, Vol. 27, No. 3, August 1998.

One major dimension of this study was the basic node granularity and the design issues of multiple processors per node.  We constructed an cluster of
four 8-processor Enterprise 5000 SMPS, each with multiple network interfaces.  Upon this system, we extended our Active Message system to utilize
multiple network interfaces simultaneously and developed a multiprotocol version of Active Messages that transparently utilized shared memory within a
node and the fast network between nodes.  The hardware configuration was carefully chosen to permit a very controlled comparison of cluster and
cluster-of-SMP architectures.

[Lum98]    Design and Evaluation of Multi-Protocol Communication on a Cluster of SMP's,
PhD thesis, University of California at Berkeley, November 1998.
Steven S. Lumetta.
 

Scalable, Available Internet Services


A very large number of demanding applications were developed and evaluating on NOW, spanning a rich spectrum of areas from architectural
simulation, machine learning, protocol verification, dynamic solids modeling, numerical linear algebra, adaptive mesh refinement, finite element modeling,
maximum liklihood genetics, parallel rendering, improcessing, compression, real-time video effects, content distillation, latent semantic indexing, and
several others.  However, the massive impact was realized in the form of scalable internet services, which exploiting the availability, I/O capacity and
bandwidth, and fast communication of the cluster.  The most dramatic of these was the Inktomi search engine, originally prototyped on NOW and
rapidly transitioned into industry.  Today the Inktomi clusters serve roughly 50% of the searches on the web, servicing 20 million distinct users through
numerous popular search interfaces and portals.  It is still running a version of Active Messages derived directly from the Technology Exporation phase
of the NOW project. Detail of the design is given in the following.

     Delivering High Availability for Inktomi Search Engines
     Eric A. Brewer
     SIGMOD Record ACM Special Interest Group on  Management of Data, vol 27. no 2, 1998
 

We investigated the larger research question of extending core aspects of cluster technology out to the wide area.  One of the key discoveries was the
use of moving funtionality traditionally associated with 'front-end' machines, such as load-balancing and fail-over, automatically into the client.  A
second was the extension of the cache consistent network file system and OS interpositioning for service replication to remote clusters.  Detail is in the
following.

 
[Vah*98]    "WebOS: Operating System Services For Wide Area Applications", Amin Vahdat, Tom Anderson, Mike Dahlin, Eshwar
     Belani, David Culler, Paul Eastham, and Chad Yoshikawa. Seventh Symposium on High Performance Distributed
     Computing July 1998.

[Vah98]     "Operating System Services for Wide-Area Applications,'' Amin Vahdat. PhD Dissertation, Department of Computer
     Science, University of California, Berkeley, December 1998

A second fundamental area of development was the use of scalable transcoding services in the cluster to support constrained, poorly connected, or
mobile devices.  This investigation was done jointly with two other DARPA projects. The first prototype was the Transcend proxy, developed as part
of the BARWAN project, to provide on-the-fly content distillation to deliver content to small clients over bandwidth constrained links.

[Fox98]     Experience with TopGun Wingman: A Proxy-Based Web Browser for the 3Com PalmPilot
     A. Fox, I. Goldberg, S. Gribble, D. C. Lee, A. Polito, and E. A. Brewer
     Proceedings of Middleware '98, Lake District, England, September 1998.

The second was the Media Gateway proxy developed as part of the MASH project.  In this case, the proxy participates as a well-connected part of a
multicast-based video session, but it down samples the multimedia stream to match the limited bandwidth and functionality of a client, which may not
even be able to participate in multicast.

[Ami98]     An Active Service Framework and its Application to Real-time Multimedia Transcoding.
     Elan Amir, Steven McCanne, and Randy Katz.
     Proceedings of ACM SIGCOMM '98, Vancouver, British Columbia, September 1998.

For both of these services, a portion of the NOW cluster was set aside for baseline capability and has the load increased the proxy service would
negotiate with the Glunix layer to obtain additional transient resources.


Shell: Multimedia Component

Several accomplishments were achieved during the past year that were partially funded by the Titan grant. Some accomplishments are general services and facilities used by other Berkeley researchers and some are specific research projects completed by Professor Rowe's multimedia research group. The following paragraphs describe these accomplishments.

The general facilities include a multimedia authoring laboratory, studio classrooms, and infrastructure to support the use of digital video in distributed collaboration and distance learning which is part of the Berkeley Internet Broadcasting System. These facilities are briefly described and specific projects completed in the past year are highlighted.
Multimedia Authoring Laboratory (514 Soda Hall)
The multimedia authoring laboratory includes equipment for capturing and editing images, audio, and video media. The laboratory also has media compression facilities (e.g., JPEG, MPEG, etc.) and a variety of peripherals (e.g., CD-ROM burners, color printers, removable media devices, etc.). The laboratory is used to produce videotape demonstrations of various research projects (e.g., "The Campanile Movie," "Landay CSCW Video Clips," etc.) and on-line multimedia titles (e.g., "The ACM SIGCHI Video Gallery," "Conversations with History," "California Sheet Music Project,"  "BMRC Retreats," etc.).

A studio classroom has audio/video equipment that supports distance learning and distributed collaboration.  This classroom is used to produce the weekly Berkeley Multimedia, Interfaces, and Graphics (MIG) Seminar that is broadcast world-wide on the Internet Mbone. Beginning in the fall 1998 semester, selected other classes have been broadcast live to the Berkeley campus and recorded for on-demand replay as part of the Berkeley Internet Broadcasting System described below.
A significant upgrade to the classroom was completed during the past year. Some of the changes made were:
1. Sound proofing the room for improved audio.
2. Installation of permanently mounted microphones in the ceiling for audience questions.
3. Installations of a VCR and ELMO Camera Stand at the speaker position and a scan converter for the presentation computer to simplify speaker operation and improve capture of presentation material for remote participants.
4. Installation of computer-controlled cameras for the speaker and the audience.
5. Installation of an audio mixer.
6. Installations of two capture machines so remote participants can view two video streams from the room (e.g., speaker and presentation material or audience and speaker).
7. Installation of improved lighting of the speaker to improve the visual images.
8. Installation of a point-to-point audio/video link to the broadcast center in 530 Soda Hall. This link allows us to produce a third audio/video stream from the classroom or to simulcast the program using a different technology (e.g., Real Networks).
These infrastructure improvements have improved: 1) the quality of the captured audio and video, 2) the quality of the programs produced from the room, and 3) reduced the labor and effort required to produce a program. We continue to improve the classroom as we discover new problems and opportunities.  Currently we are working on the quality of captured audio, support for remote operation of all equipment, and improved interactivity for remote participants.

The weekly Berkeley MIG Seminar broadcast has been produced for more than four years. This seminar has been extremely successful both as a production and as a vehicle for research.  We use both Internet Mbone and Real Networks technology to broadcast the seminars. A typical broadcast is composed of three simulcast sessions.
1. A low bit-rate Mbone broadcast (200 Kbs aggregate bandwidth) that is transmitted on the public Mbone.
2. A medium bit-rate Mbone broadcast (1 Mbs aggregate bandwidth) that is transmitted on the vBNS and Calren2 experimental networks.
3. A Real Networks broadcast composed of a 50 Kbs and a 250 Kbs stream that is transmitted over the Internet.
Several research projects are described below that relate to the software needed to produce and manage these broadcasts. We have continued to improve the campus infrastructure for producing on-demand and live streaming video programs. During the past year we have installed machines, called video gateways, at different locations on campus. We installed one video gateway in the Office of Media Services (OMS), which is the campus audio/video support organization, which allows us to produce programs from approximately 20 other rooms on campus. We also installed a video gateway connected to a satellite dish so we can capture live material as a source for video processing and compression algorithm development.
During the next year we plan to install more video gateways in OMS and classrooms so we can increase the number of concurrent broadcasts and satisfy the demand by other researchers for distributed collaboration experiments. In addition, we will continue to enhance the equipment in 405 Soda Hall.

Many research accomplishments relate to development of the Berkeley Internet Broadcasting System (BIBS). The goal of this system is to support live and on-demand interactive television using streaming audio/video on the Internet. Distance learning is the primary application. To make this system practical, we must reduce the cost and effort required to produce a broadcast and to improve the quality of the broadcasts (i.e., audio/video media and interactivity). The remainder of this section describes research in four areas: broadcast production, parallel video processing, layered multicast, and end-user digital video editing.

The key to reducing cost is to incorporate software automation. For example, the Berkeley MIG Seminar requires ten processes to be started on seven different machines. These processes are configured to work together by setting parameters to the different programs. For example, the rtpgw process that transcodes the medium bit-rate video stream produced by the machine in the classroom to a low bit-rate video stream that will be broadcast to the Public Mbone must be configured with the appropriate session addresses and coding parameters. Starting the broadcast required 30-45 minutes and was error prone because the broadcast engineer made mistakes starting the processes and entering the parameters. The broadcast engineer was typically a graduate student with at least one month of training.

We developed a tool, called the Broadcast Manager (bmgr), which stores broadcast configurations in a file and provides a GUI interface for starting and monitoring programs and entering and editing broadcast configurations [Wu99]. After deploying bmgr, we are able to start the broadcast in less than five minutes with essentially no errors. We now use undergraduates with approximately thirty minutes of training as broadcast engineers. This automation tool was so successful we were able to broadcast six additional classes in the fall 1998 semester with essentially no error due to configuration errors.

Another tool we started to develop is a Director's Console (dc) which allows the broadcast engineer to control all audio/video equipment in the studio classroom.  For example, the broadcast engineer can switch audio/video streams to the different capture machines, adjust the bit-rates allocated to the different streams, and move the cameras (e.g., pan and tilt). A typical broadcast is composed of two streams: the speaker and the presentation material. During a lecture the speaker might use a presentation PC to show slides or demonstrate programs, an ELMO camera stand to show transparencies, or a VCR to play a videotape. Using dc, the appropriate stream can be switched into the broadcast and the bit-rate allocated to the stream adjusted. For example, we typically allocate 100 Kbs to the speaker video, 64 Kbs to the speaker audio, and 35 Kbs to the presentation material for the Public Mbone broadcast which must be less than 200 Kbs for the session. But, when the speaker switches to the VCR or plays an animation or video on the PC, we reduce the bit-rate allocated to the speaker and raise the bit-rate allocated to the presentation material because that stream is more important. Another example of using dc is to change the broadcast when a person asks a questions, that is, switch the presentation stream to the audience camera and move the camera to the left or right side of the room depending on where the person is seated.
The current dc tool is a manual interface. In other words, it mechanizes existing control interfaces. We are working on improvements that will automate the decisions made by a human operator. For example, the bit-rate allocation mentioned above can be automated by recognizing when the speaker switches the projected presentation material from one source to another and by dynamically adjusting the allocations using the SCUBA protocol [Amir97]. Other examples of automation are recognizing an audience question from the speaker microphones, switching the video streams and pointing the camera using sound location on the three audience microphones.
Another example is accepting a question from a remote participant. The changes required for this even include: 1) turn-off room microphones to reduce feedback, 2) switch the remote audio to the speakers in the room and adjust the sound levels, and 3) switch the remote speaker video stream, if available, to the projection screen. We also plan to use the video effects processor discussed in the next section to add a title to the person asking the remote question.

Lastly, we anticipate that this automation system should include local preference setting by remote participants.

Parallel Video Processing

Another technique to improve visual quality is to incorporate special effects (e.g., composition, titling, chromakey, etc.). Given the dramatic improvement of off-the-shelf processors, we decided to develop a software-only video effects system using network-of-workstations (NOW) technology. The system is now in its second year of development and continues to be a research vehicle for multimedia issues. During the past year we published papers describing a user-interface and model for software-only video effects [Wong 98], the use of temporal parallelism [Meyer-Patel98], and the use of spatial parallelism [Mayer-Patel99a]. Mechanisms to support functional, temporal, and spatial parallelism have been implemented and demonstrated on several video effects (e.g., titling and fades).
We have also designed and developed control mechanisms for the system that use IP-Multicast and submitted a paper describing these mechanisms [Meyer-Patel99b]. The system was demonstrated as part of the Berkeley MIG Seminar in February 1999 when video effects were generated in real-time and incorporated into the seminar broadcast. All experiments and measurements have been conducted using the Sparc NOW acquired with Titan funds. In addition, we use Intel-donated PC's as part of our development environment.
We are continuing to develop video effects for the system and mechanisms to incorporate the system into dc so it can be used on a regular basis. We are also exploring the development of an effect scripting language which will reduce the coding required to specify video effects and the possibility of distributed effects processing to other nodes on the Internet with customization by end-users.

Directory Services for Layered Multicast Sessions

The long-term vision for Internet streaming video includes the use of source-channel and layered coding to delivery the best quality possibility to every end station given the bandwidth and latency constraints of the network connection. For example, a participant connected to a high-speed wireline network should get a better image than someone connected by a wireless network. Rather than simulcasting several different streams, as we now do, we want to incorporate layered multicasting in which different layers are transmitted on different multicast addresses [McCanne96]. One problem with this technology is selecting session addresses and distributing them to participants. An example will illustrate the problem. Suppose several people at Berkeley are communicating with several people at another university. Suppose further that the connection between the two universities is a slow link. The system ought to allocate a base layer that will be shared by all participants and enhancement layers that are available only to participants on each campus. But, given that multicast addresses are allocated by administrative scopes, the addresses allocated for the enhancement layers at Berkeley will be different than the addresses allocated at the other university. The current IETF protocols and software tools for Mbone session announcements (i.e., SAP and sdr) do not support layered multicast addressing. A solution to this problem, including caching of announcements, was designed and implemented during the past year, and a paper describing the system was published [Swan 98]. The paper won the Best Student Paper Award at ACM Multimedia '98.
This work used computer systems and networking facilities acquired with Titan funds.

End-User Direct Manipulation Non-Linear Editor

Digital audio and video are not ubiquitous. Easy-to-use tools that will run on any desktop are not available. One aspect of the problem is that conventional non-linear editing systems use a complicated, but very functional, user interface. During this past year we developed an end-user direct manipulation interface for a non-linear audio/video editor. The idea was to simplify the user interface by constraining the functions available. A prototype system was implemented, and a paper has been submitted for publication that describes the system [Steele98].
This work used computer systems acquired with Titan funds.

References

[Amir97] E. Amir, S. McCanne, and R. Katz. "Receiver-driven Bandwidth Adaptation for Light-weight Sessions," Proceedings of ACM Multimedia '97, Seattle, WA, Nov 1997
[Mayer-Patel97] K. Mayer-Patel and L.A. Rowe, "Design and Performance of the Berkeley Continuous Media Toolkit," in Multimedia Computing and Networking 1997, Proc. IS&T/SPIE Symposium on Electronic Imaging: Science & Technology, pp 194-206 San Jose CA, Jan 1997.
[Mayer-Patel98] K. Mayer-Patel and L.A. Rowe, "Exploiting Temporal Parallelism for Software-only Video Effects Processing," Proc. ACM Multimedia 98, Bristol UK, Sep 1998.
[Mayer-Patel99] K. Mayer-Patel and L.A. Rowe, "Exploiting Spatial Parallelism for Software-only Video Effects Processing," Multimedia Computing and Networking 1999, Proc. IS&T/SPIE Symposium on Electronic Imaging: Science & Technology, pp.252-263, San Jose CA, Jan 1999.
[McCanne96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-driven Layered Multicast," Proc. ACM SIGCOMM 96, Stanford CA, Aug 1996, pp. 117-130.
[Steele99] M. Steele, M. Hearst, and L.A. Rowe,, "The Video Workbench: A Direct Manipulation Interface for Digital Media Editing by Amateur Videographers," Submitted for publication, January 1999.
[Swan98] A. Swan, S. McCanne and L.A. Rowe, "Layered Transmission and Caching for the Multicast Session Directory Service," Proc. ACM Multimedia 98, pp. 119-128, Bristol UK, Sep 1998.
[Wong 98] Tina H. Wong, Ketan Mayer-Patel, David Simpson and L.A. Rowe, "A Software-Only Video Production Switcher for the Internet MBone Multimedia Computing and Networking 1998, Proc. IS&T/SPIE Symposium on Electronic Imaging: Science & Technology, San Jose CA, Jan 1998.
[Wu99] D. Wu, A. Swan, and L.A. Rowe, "An Internet Mbone Broadcast Management System," Multimedia Computing and Networking 1999, Proc. IS&T/SPIE Symposium on Electronic Imaging: Science & Technology, San Jose, CA, Jan 1999.
 

Driving Applications


CONTROL: Continuous Output and Navigation Technology with Refinement On-Line

The CONTROL project has been developing technologies to give users control over long-running, data-intensive operations.  Prior to this work, long-running operations resulted in batch processing, in which users had to wait while the system completed all the steps of a particular task.  CONTROL remedies this problem with three basic techniques.  First, it delivers data in a non-blocking fashion, so that users receive continuous feedback during processing.   Second, it provides estimation and abstraction techniques so that users get meaningful information out of the data that has been processed at any given time; these techniques provide progressive refinement to an eventually correct and complete result.  Third, it allows users to control the system’s processing on-line, changing the data delivery as they observe the output. CONTROL-based applications shield users from frustrating wait times, encouraging interactive experimentation with large data sets.  CONTROL technology also has the power to allow low-cost equipment to be useful for large tasks, by providing useful estimations with a minimal investment of computing resources.  We have applied this technology to a variety of applications, including relational query processing, data mining, data visualization, and spreadsheets.

"Interactive Data Analysis with CONTROL" Joseph Hellerstein, Ron Avnur, Andy Chou, Chris Olston, Vijayshankar Raman, Tali Roth, Christian Hidber, Peter J. Haas. To appear, IEEE Computer.
 

GiST: Generalized Search Trees

This work is based on a data structure called the Generalized Search Tree, or GiST.  The GiST is a template index structure that can be used to support arbitrary queries over arbitrary data types.  GiSTs are a key component of efficient extensible and object-relational database systems, since they allow users to easily develop custom indexing support for their applications.  We have developed concurrency control and recovery protocols for GiSTs; this work allows GiSTs to be used in industrial-strength, high-performance transactional databases, and in fact a commercial database vendor is aggressively working on integrating GiSTs into their products. In joint work with computer vision researchers, GiSTs are being used by to the image database queries in Berkeley’s BlobWorld project (http://elib.cs.berkeley.edu/photos/blobworld/).  We are also pursuing the application of GiST to genome matching problems in bioinformatics.

GiSTs allow indexes to be built for any queries over any data, but such indexes can not always be efficient due to inherent mathematical limitations.  Within the GiST project we have worked to characterize this problem by developing a theory of indexability, akin to theories of tractability, to measure the hardness of an indexing task.  We have also developed a practical analysis toolkit called amdb (the “access method debugger”) to help evaluate, tune and debug custom indexes for specific applications.

Blobworld: A System for Region-Based Image Indexing and Retrieval (with Chad Carson, Serge J. Belongie, Megan C. Thomas and Jitendra
Malik). To appear, Visual '99.