Two mini talks by Joe Hellerstein and Gautam Kar
IBM Research

Using Control Theory To Achieve Service Level Objectives For An E-Mail Server
by Joe Hellerstein

A widely used approach to achieving service level objectives for a target system (e.g., an email server) is to add a controller that manipulates the target system's tuning parameters. We describe a methodology for designing such controllers for software systems that builds on classical control theory. The classical approach proceeds in two steps: system identification and controller design. In system identification, we construct mathematical models of the target system. Traditionally, this has been based on a first-principles approach, usingdetailed knowledge of the target system. Such models can be difficultto build, and too complex to validate, use, and maintain. In our methodology, a statistical (ARMA) model is fit to historical measurements of the target being controlled. These models are easier to obtain and use and allow us to apply control-theoretic design techniquesto a larger class of systems. When applied to a Lotus Notes groupware server, we obtain model fits with $R^{2}$ no lower than 75% and as highas 98%. In controller design, an analysis of the models leads to a controller that will achieve the service level objectives. We report on an analysis of a closed-loop system using an integral control law with Lotus Notes as the target. The objective is to maintain a reference queue length. Using root-locus analysis from control theory, we are able to predict the occurrence (or absence) of controller-induced oscillations in the system's response. Such oscillations are undesirable since they increase variability, thereby resulting in a failure to meet the service level objective. We implement this controller for a real Lotus Notes system, and observe a remarkable correspondence between the behavior of the real system and the predictions of the analysis. This allows us to select the proper parameter for the controller from the analysis alone.

Taxonomy, Modeling and Computation of Dependencies for Distributed Management
Gautam Kar

This talk addresses the role of dependency analysis in the general area of distributed management. Specifically, we point out the need for developing a methodology for identifying, classifying, representing and computing dependency information in order to do effective configuration, fault and performance management in a complex IT environment.The discussion focuses on developing a systematic methodology for obtaining dependency information in an IT service environment, representing this information within the framework of a model that can facilitate the design of applications that do fault, performance, configuration and availability management. The main questions raised in this talk are: what are the important characteristics of dependencies? In other words, when a managed entity, such as a service or resource, depends on another managed entity, what are the properties of such a dependency that need to be recorded? How can we classify dependencies such that they can be used more efficiently to do root cause or impact analysis in fault management? The paper introduces the concept of dependency lifetime that traces the flow of dependency information fromthe design to installation to runtime stages of a service. Three categories of models, functional, structural and operational, are used to represent this information as it flows from the design to the runtimestages. We show how the information obtained at each of these stages can be manipulated by management applications. In particular, approaches that we have pursued to discover dynamic dependencies will be discussed. Some applications of this approach in the context of an e-commerce environment will be mentioned.