I will present an efficient user-level thread scheduler for shared-memory multiprocessors and an analysis of its performance under multiprogramming. This scheduler is a non-blocking implementation of the work-stealing algorithm. Idle processes (kernel threads) steal (user-level) threads from randomly chosen victims, and all concurrent data structures are implemented with non-blocking synchronization. Without any need for special kernel-level resource management, such as coscheduling or process control, this non-blocking work stealer efficiently utilizes whatever processor resources are provided by the kernel.
We demonstrate this efficiency with an algorithmic analysis and an empirical analysis. For our algorithmic analysis, we assume that the kernel is an adversary, and we prove that the execution time is optimal to within a constant factor. We have implemented the non-blocking work stealer in Hood: a C++ threads library built on top of Solaris pthreads, and we have studied its performance. This study shows that application performance does conform to the theoretical bound with a very small constant factor, roughly 1. Applications efficiently utilize processor resources even when the number of processes exceeds the number of processors and even when the number of processors grows and shrinks arbitrarily.
This work has been done in collaboration with Nimar Arora, Dionisios Papadopoulos, and Greg Plaxton of The University of Texas at Austin.