Advanced Topics in Computer Systems |
10/22/01 |
Anthony Joseph & Joe Hellerstein |
|
Very general,
proportional-share scheduling algorithm.
Problems with
traditional schedulers:
·
Priority
systems are ad hoc at best: highest priority always wins
·
“Fair share”
implemented by adjusting priorities with a feedback loop to achieve fairness
over the (very) long term (highest priority still wins all the time, but now
the Unix priorities are always changing)
·
Priority
inversion: high-priority jobs can be blocked behind low-priority jobs
·
Schedulers are
complex and difficult to control
Lottery scheduling:
·
Priority
determined by the number of tickets each process has: priority is the relative
percentage of all of the tickets competing for this resource.
·
Scheduler picks
winning ticket randomly, gives owner the resource
·
Tickets can be
used for a wide variety of different resources (uniform) and are machine
independent (abstract)
How fair is lottery
scheduling?
·
If client has
probability p of winning, then the expected number of wins (from the
binomial distribution) is np.
·
Variance of
binomial distribution: s = np (1 – p)
·
Accuracy
improves with: Ön
·
Geometric
distribution used to find tries until first win
·
Big picture
answer: mostly accurate, but short-term inaccuracies are possible; see Stride
scheduling below.
Ticket Transfer: how
to deal with dependencies
·
Basic idea: if
you are blocked on someone else, give them your tickets
·
Example:
client-server
o Server has no tickets of its own
o Clients give server all of their tickets
during RPC
o Server’s priority is the sum of the
priorities of all of its active clients
o Server can use lottery scheduling to give
preferential service to high-priority clients
·
Very elegant
solution to long-standing problem (not the first solution however)
Ticket inflation:
make up your own tickets (print your own money)
·
Only works
among mutually trusting clients
·
Presumably
works best if inflation is temporary
·
Allows clients
to adjust their priority dynamically with zero communication
Currencies: set up
an exchange rate with the base currency
·
Enables
inflation just within a group
·
Simplifies
mini-lotteries, such as for a mutex
Compensation
tickets: what happens if a thread is I/O bound and regular blocks before its
quantum expires? Without adjustment, this implies that thread gets less than
its share of the processor.
·
Basic idea: if
you complete fraction f of the quantum, your tickets are inflated by 1/f until
the next time you win.
·
Example: if B
on average uses 1/5 of a quantum, its tickets will be inflated 5x and it will
win 5 times as often and get its correct share overall.
·
What if B
alternates between 1/5 and whole quantums?
Problems:
·
Not as fair as
we’d like: mutex comes out 1.8:1 instead of 2:1, while multimedia apps come out
1.92:1.50:1 instead of 3:2:1
·
Practice
midterm question: are these differences statistically significant? (probably
are, which would imply that the lottery is biased or that there is a secondary
force affecting the relative priority)
·
Multimedia app:
biased due to X server assuming uniform priority instead of using tickets.
Conclusion: to really work, tickets must be used everywhere. Every queue is an
implicit scheduling decision... Every spinlock ignores priority...
·
Can we force it
to be unfair? Is there a way to use compensation tickets to get more time,
e.g., quit early to get compensation tickets and then run for the full time
next time?
·
What about
kernel cycles? If a process uses a lot of cycles indirectly, such as through
the Ethernet driver, does it get higher priority implicitly? (probably)
Stride Scheduling:
follow on to lottery scheduling (not in paper)
·
Basic idea:
make a deterministic version to reduce short-term variability
·
Mark time
virtually using “passes” as the unit
·
A process has a
stride, which is the number of passes between executions. Strides are inversely
proportional to the number of tickets, so high priority jobs have low strides
and thus run often.
·
Very regular: a
job with priority p will run every 1/p passes.
·
Algorithm
(roughly): always pick the job with the lowest pass number. Updates its pass
number by adding its stride.
·
Similar
mechanism to compensation tickets: if a job uses only fraction f, update its pass
number by f ´ stride instead of just using the stride.
·
Overall result:
it is far more accurate than lottery scheduling and error can be bounded
absolutely instead of probabilistically
Goal: support fine-grained parallelism in a multiprogrammed
environment.
Fine-grained
parallelism model discussed: threads in a shared address space (others are also
possible).
Threads can be implemented in two different
ways:
·
Kernel-implemented:
o Kernel creates and dispatches threads.
o Expensive: thread context switch involves
crossing protection boundary to/from kernel.
o Inflexible: can’t easily customize the
scheduling policy.
·
User-level:
o Create one kernel thread for each processor,
just use these like an OS would use processors to run the user-level threads.
o Implement user-level threads entirely at the
user-level in the runtime system:
§
1) Any user
thread can run on any kernel thread.
§
2) Very fast,
both for thread creation and context switch (no kernel calls in either case),
and
§
3)
Synchronization between user threads can be handled entirely at user-level. Can
do things like spin-wait on locks.
o Result: much faster thread primitives can
support much finer-grained parallelism.
Problem with
user-level threads: scheduling decisions being made independently by both
kernel and user-level runtime system.
·
If a user
thread executes a kernel call then the kernel thread becomes blocked:
application loses a processor.
·
Kernel may
de-schedule a kernel thread at a bad time for the application, e.g. when the
user thread being run by that kernel thread is in a critical section.
·
Application may
suddenly need fewer threads and runs idle user threads on some of its kernel
threads; the kernel doesn’t know to de-schedule those threads and give the
processor to other applications.
Solution: design a
protocol for passing scheduling information back and forth between the kernel
and the runtime system.
Scheduler
activation:
·
Vessel for
running user threads (i.e. acts like a kernel thread). Can be thought of as a
virtual processor in this respect.
·
Notifies the
user-level runtime system of interesting kernel events.
·
Provides space
in the kernel for saving processor context of the currently running user thread
when the thread is stopped by the kernel (e.g. for I/O or processor preemption
to another application).
Kernel creates a new
activation and does an upcall for one of the following reasons:
·
New processor
available. Runtime picks a user thread to run on it.
·
Existing
activation blocked (e.g. for I/O or page fault). Runtime picks another user
thread to run on the new activation.
·
Activation
unblocked and is now runnable. New activation includes processor context for two
old activations: the newly unblocked one and the one that was preempted in
order to make this notification. Why was it necessary to preempt a second
activation?
o To
obtain a processor to run on. See fig 1 in paper, where black spots represent
processors.
·
Activation lost
its processor (to another application). Similar to unblocked activation case:
new activation contains processor contexts for two old activations: the one
whose processor was allocated to another application and the one whose
processor is being used to run the new activation.
Runtime informs the
kernel of the following events:
·
No. runnable
threads = No. processors + -1
·
Tells the
kernel about transitions from needing another processor to not needing another
processor and vice-versa.
·
Don’t need to
tell kernel about greater disparities between the two because that won’t change
the kernel’s behavior.
·
All other
runtime thread operations are strictly user-level.
Result: get the
performance of user-level threads with the consistent behavior of kernel
threads.
Some details:
·
User-level
priority scheduling: may need to pull a lower priority user thread off of
another activation. This is done by having the runtime tell the kernel to
preempt the processor running the low priority user thread (only the kernel can
preempt a processor). The preempted processor is used to do an upcall back to
the application.
·
Dealing with
preempted activations running in critical sections:
o Runtime checks during an upcall whether the
preempted/unblocked user thread was running in a critical section. Continues
the user thread out of the critical section if so. Then puts the user thread on
the appropriate queue.
o Critical sections are detected by keeping a
hash table of section begin/end addresses that are computed by placing special
assembly instructions around critical sections in the object code and then
post-processing the object code.
3 key features about
this paper:
·
Goal is to get
user-level threads performance with the scheduling consistency provided by
kernel-level threads in a multiprogramming environment.
·
The problem to
solve: coordinating two independent thread schedulers: the kernel and the
application runtime.
·
Scheduler
activations used as a vessel to transmit information between the two as well as
to provide virtual processors for running user-level threads.
Some flaws:
·
Authors wave
their hands regarding the 5x slower upcall than kernel thread performance.
·
Only one
application was tested. How would “ordinary” user-level threads perform
relative to scheduler activations on other applications? Does the kernel’s
scheduling policy affect the relative performance in any interesting ways?
A lesson: Export your functionality (in this case, threads) out of the kernel for improved performance and flexibility and figure out how to interact with the kernel “just enough” to allow the kernel to do its job as “traffic cop” among competing applications.