Software Micro-rejuvenation Archana Ganapathi Software rejuvenation is a useful technique to avert failures that are side effects of continuously running applications. While software rejuvenation is an attractive technique for preventative maintenance, micro- rejuvenation is infeasible in legacy systems without additional support for micro-reboots. Micro- rejuvenation must possess the capability to checkpoint a system prior to the reboot and thereafter, the check pointed state must be reloaded to ensure ACID properties. For example, [4] reveals that rejuvenation is feasible at the application level with the help of "watchdog daemon" processes such as watchd. To invigorate applications, rejuvenation must be invoked at the operating system level. However, if the granularity of rejuvenation is a process, indicating micro-rejuvenation, then the application comprising that process must be fully aware of inter-process dependencies. Rejuvenation at process-level granularity carries the likelihood of inter-process interference when any process is rejuvenated. For example, consider process A waiting on process B. Upon rejuvenation of process B, process A is forced to time out and abort unless the micro-reboot time is well within process B's response allowance. Otherwise, when process B is resurrected, its process state reverts to pre-rejuvenation state, meanwhile process A's state has changed. Such inconsistencies tantamount to consideration of all processes that depend on process B, check pointing their states and then restoring them upon completion of B's rejuvenation. In this case, the application must identify dependencies between its constituent processes and undertake responsibility for communicating this information to the controlling operating system. (Note: Thread level rejuvenation is at an even finer granularity than process level and it is also quite complicated, if not impossible, to obtain a thread level dependence graph by any application.) Check pointing state information can be performed externally with respect to the application but determining inter-process dependencies must be undertaken by the application internally. Process dependencies are dynamic and any such analysis must be performed at run time, significantly affecting performance; they cannot be determined by static analysis on the application code. Specifically, an application must be designed to support rejuvenation as suggested by Hong et al in [3]. Unfortunately, legacy systems have not been designed with this facility and as with all software systems, they age with time due to degradation of system resources. It is considerably late to either incorporate rejuvenation features in legacy code or to control degradation. Open-loop rejuvenation statically determines the criteria for performing rejuvenation action. However, closed-loop rejuvenation systems based on a threshold for software degradation or alertness level are becoming increasingly attractive [1, 2], as they "cost" significantly less than open-loop systems. Hong et al formulate a system using finite state automata (FSA) and report their results with the Apache Web-server software with simulated memory-leak bugs. They deactivate configuration variables such as MaxRequestsPerChild and MaxClientNum that periodically interfere with software rejuvenation. Their system is more effective as an evaluation environment for rejuvenation policies rather than a mechanism for effectively avoiding system crashes to improve availability. Furthermore, not all processes in legacy code are amenable to an FSA model. References: [1] Andrea Bobbio, Matteo Sereno and Cosimo Anglano, "Fine Grained Software degradation models for optimal rejuvenation policies", Performance Evaluation, Vol. 46, 45-62, 2001 [2] Vittorio Castelli, Richard E. Harper, Philip Heidelberger, Steven W. Hunter, Kishor S. Trivedi, Kalyanaraman Vaidyanathan, William P. Zeggert, "Proactive management of software aging", IBM Journal of Research and Development 45(2): 311-332 (2001) [3] Y. Hong, D. Chen, L. Li, K. Trivedi, "Closed Loop Design for Software Rejuvenation", Workshop on Self-Healing, Adaptive and self-MANaged Systems (SHAMAN), New York, NY, June 2002. [4] Yennun Huang, Chandra Kintala, Nick Kolettis and N. Dudley Fulton, "Software rejuvenation: analysis, module and applications", Proc. 25th Intl. Symposium on Fault-Tolerant Computing, June 1995