Active Disks - Remote Execution for Network-Attached Storage

Today's commodity disk drives are actually small computers, with general-purpose processors (high-end drives have control processors from the Motorola 68000 family), memory (one to four megabytes today, and moving higher), a network connection (SCSI over short cables today, but moving to FibreChannel), and some spinning magnetic material to actually store/retrieve the data. The increasing performance and decreasing cost of processors and memory are going to continue to cause more and more intelligence to move into peripherals from CPUs. Storage system designers are already using this trend toward "excess" compute power to perform more complex processing and optimizations inside storage devices. To date, such optimizations have been at relatively low levels of the storage protocol. At the same time, trends in storage density, mechanics, and electronics are eliminating the bottleneck in moving data off the storage media and putting pressure on interconnects and host processors to move data more efficiently. We propose a system called Active Disks that takes advantage of processing power on individual disk drives to run application-level code. Moving portions of an application's processing to execute directly at disk drives can dramatically reduce data traffic and take advantage of the storage parallelism already present in large systems. The focus of this work is to identify the characteristics of applications that make them suitable for execution at storage devices and quantify the benefits to individual application performance and overall system efficiency and scalability from the use of Active Disks. In this talk, I will focus on the opportunities opened up by current trends in storage devices, discuss a model of expected speedup for applications on Active Disks, and present results from a prototype system on several large-scale applications.