Intelligent Disks: A Evolutionary Approach to Scalable Database Infrastructure
The I/O capacity and computational demands of decision support (DSS) databases are doubling roughly every nine months. This growth is faster than the growth for processor speeds and disk capacities, which doubles roughly every 18 months according to Moore's Law. Surprisingly, conventional shared-memory multiprocessor (SMP) DSS servers are limited by processor performance, despite the fact that disks comprise the majority of their cost.
One approach to scalable decision support was massively parallel processors. Another approach was database machines, which proposed on custom hardware, custom operating systems, and custom database software. Both of these have largely failed due to the cost of the custom hardware and software versus the servers which leverage desktop technology. and operating systems. Yet another is clusters of PCs, each with, say, two disks in the box. The downside of clusters is that database management systems must change to the shared-nothing model, since there is no large "front-end" machine.
An alternate hardware architecture for decision support is the Intelligent Disk (IDISK). IDISKs incorporate a low-cost embedded processor, memory, and fast serial line network connection on each disk, following trends in the design of modern disks. IDISKs are connected to each other via high-speed switches, avoiding the I/O bus bottleneck of conventional designs. IDISKs and their switched interconnect would replace the conventional "dumb" disks and I/O buses in a conventional architecture. We envision that primitive database operations would be downloaded into the disk processors, in effect making an IDISK a database accelerator in the framework of conventional database server software.
IDISK servers offer several potential advantages. First, the general purpose nature of IDISK hardware and software overcomes the special- purpose Achilles' heel of past efforts. Second, IDISKs allow the processing of the system to scale with the storage capacity, to meet the ever-growing demands of DSS workloads. Third, IDISK servers may provide a cost win over conventional systems, by shifting the bulk of the processing to cheaper disk processors. Further IDISK cost wins may come from leveraging the power supplies, cables, and enclosures of conventional disks, rather than requiring additional enclosures to house more expensive central processors. Finally, switches enable IDISK to have scalable communication, unlike bus-based solutions.
IDISKs can be viewed as clusters of very low cost processors, with a conventional SMP in front to evolve database software.
The talk provides a back-of-the-envelope evaluation of IDISK for two queries of the TPC-D benchmark and also for MinuteSort. Collectively they explore the cost-performance for processor-bound and disk-bound problems. The tentative conclusion is that the combination of processing and storage afforded by IDISKs allows DSS performance to scale up much more efficiently than the conventional alternative.