Analysis of the I/O Characteristics of Production Database Workloads and the TPC Benchmarks

by Windsor W. Hsu, Alan Jay Smith and Honesty C. Young

I/O is increasingly the bottleneck in computer systems as processor performance continues to improve at a much faster rate than storage performance. This is especially the case for large database systems that reference a lot of data. There are several well-known approaches to improving I/O performance. Among them are caching, prefetching and write buffering. The effectiveness of these general approaches depends very much on the characteristics of the reference stream. In addition, if the reference characteristics are well understood, these techniques can be customized to further improve performance.

Nevertheless, there has not been much work on analyzing the reference characteristics of production database workloads. This reflects the fact that production systems are by definition critical to the proper functioning of an organization so that it is very difficult to get access to them for the purpose of conducting a scientific study, especially if the study requires any software changes or if data is to be collected and removed from the system. There are some old studies based on a couple of traces collected from hierarchical and network databases but these studies report conflicting results as to whether locality or sequentiality is present in the database reference stream. Clearly, the reference behavior depends on the workload imposed on the database but in many cases, the characteristics of the workloads analyzed are not adequately described.

Therefore, this research was initiated to gain a thorough and practical understanding of the reference characteristics of production database workloads. We managed to obtain access to a large set of traces that were collected from IBM's industrial-strength DB2 relational database management system (DBMS) in the real production environments of some of the world's largest corporations. We believe that this collection of traces represents by far the most complete and diverse set of production workloads ever reported on in the literature. We cannot overemphasize the amount of time, effort and cost that these traces represent. This research would not have been possible without the support and help of many.

In part 1 of this study, we analyze the descriptive system-level characteristics of the workloads. These are the logical properties of a workload that a user or system administrator can readily understand and relate to without requiring detailed knowledge of the internals of the system and are therefore good features to use for comparing and understanding workloads. Armed with a clear picture of the workload attributes, we go on in part 2 to examine the logical I/O reference behavior of the workloads. Our primary focus is on analyzing the factors that affect how these workloads respond to different techniques for caching, prefetching and write buffering. We evaluate many previously published algorithms and techniques and also develop several new ones based on our insights. Since it is extremely rare to have access to such a large collection of production workloads, an emphasis of this research is on establishing broadly applicable rules of thumb with regards to the characteristics of the production workloads and the effectiveness of caching, prefetching and write buffering.

In recent years, the Transaction Processing Performance Council (TPC) benchmarks C (TPC-C) and D (TPC-D) have emerged as the de facto standard benchmarks for on-line transaction processing (OLTP) systems and decision support systems (DSS) respectively. While such standard benchmarks are important for progress in the field in that they define the playing field by establishing objectives that are easily measurable and repeatable, the real utility of the benchmarks is whether they represent the workloads of interest. Although the TPC-C and TPC-D benchmarks have become widely accepted and as a result are heavily used for both systems design and marketing, there has not been any major effort to empirically determine their workload characteristics, let alone to establish how representative their characteristics are of real workloads. Therefore, another objective of this study is to determine whether the characteristics of the TPC benchmark reflect those of the production workloads, especially with respect to the various techniques such as read caching, prefetching and write buffering that can be applied at the logical I/O level to improve overall DBMS performance.

Our results are discussed in detail in two reports which are available here: Part1, Part2

In addition, we are making available the charts and the data from which they are derived. All we ask is that you acknowledge the source of the data and let us know what you find useful and the wonderful things that you are doing with it.

Full-sized figures (PDF Format): Part1 (267KB), Part2 (2MB)

Data (Excel 97 Format): part1.xls (4.7MB), part2.xls (5.7MB)