Notes and to do list on Dataflow for Parallel Systems
(Collection of notes and materials to help substantiate SCORE on Millennium)
Eylon's MICRO submission, why are streams good?
http://www.cs.berkeley.edu/~eylon/brass/extended-abstract.pdf
-
Streams:
-
expose inter-thread communication dependencies (data-flow),
allowing scheduling to be efficient,
-
allows to batch data (amortize set up costs),
-
streams can be exposed as a feature of the architecture like
memory interface, which guarantees safety and protection of threads execution
and memory space.
-
Streams are the ONLY mechanism that threads (pages/operators) use to communicate
with each other.
-
Streams provide blocking reads and non-blocking writes. The first guarantee
determinism: independence of program execution to scheduling and communication
timing.
-
Other models make scheduling difficult with obscure communication patterns
(memory aliasing) and explicit synchronization (semaphores). This limits
applications scalability on larger systems and forward compatibility.
-
Batch processing tokens is useful to amortize the runtime context switch
time and set up cost for streams. This is similar to blocking of data to
improve cache locality, but the size of a block is determined at runtime
based on available buffer sizes, etc. This late binding of time further
contributes to the forward compatibility and scaling on larger
hardware (contrast with non-streaming models that fix block size at compile
time).
-
Common streaming protocol allows a page to be completely oblivious whether
the page is connected to another page, segment, or CPU.
River projects stuff.
-
Cluster I/O with River
-
Performance heterogeneity between nodes and devices in a clustered system
environment limits application's ability to perform at close to ideal speeds.
Rather than attempting to eliminate heterogeneity, use River with
-
Distributed Queue (DQ) balances work across consumer of the system
-
Graduated Declustering (GD) dynamically adjusts the load generated by the
producers.
-
River project does not attempt to perform meticulous allocation
of resources nor avoid perturbations that occur in multi-node systems.
The attempt is to provide the system that can "handle it."
-
Blocking receives are used: Active Messages find a good use since they
have blocking receives, avoiding polling.
-
"the programmer needs only to specify which nodes to place various modules
upon" to instantiate the graph on a more than 1 node (page 5 lc).
-
Through graduated declustering and mirroring, the applications gain robustness
to read perturbations.
-
Two flavors of queues (like streams) are used: (1) push-based for small
chunks of data, where the producer send new messages to the consumers with
below the threshold number of outstanding send requests, (2) pull-based,
where consumers query the producers for data to achieve balance loading.
-
Lessons learned from their example of River Sort: application writer does
not need to write code to manage IO, but all that needs to be written are
sort and partition modules. Scaling to a parallel sort is the matter of
constructing flow graph.
-
Other parallel programming environments (Cilk, LazyThreads, Multipol) balance
load across consumers in order to allow highly-irregular, fine-grained
parallel application. The main difference between River and these systems
is the granularity of communication, because River pushes data between
node in large chunks.
NOTE: SCORE applications communication is fine-grained, but application
flow (communication pattern) is very regular.
Andrea C. Arpaci-Dusseau's implicit coscheduling paper and other
papers
http://now.CS.Berkeley.EDU/Implicit/
http://www.cs.wisc.edu/~dusseau/
Patrick Sobalvarro's thesis and publications:
http://www.psg.lcs.mit.edu/~pgs/
Look into terms
-
coscheduling
-
gang scheduling
Edward Lee's work on multi-processors.
http://ptolemy.eecs.berkeley.edu/~eal/
-
Understand the domain (SDF, BDF, further???)
-
Ptolemy: what does it do that we do/don't?
Who else does scheduling data-flow on multiprocessors?
-
Fine grain -- (e.g. Edward Lee's works for embedded multiprocs)
-
Coarse grain dataflow
Dataflow languages
-
ID @MIT Arvind
-
parallel Haskell Arvind @MIT, Hudak @Yale
-
Earth (still alive ac Yelick) Gao Guany @ U of Delaware
-
SISAL - Los Alamos
Think of applications
$Id: data_flow_notes.html,v 1.3 2001/10/24 04:25:26 yurym Exp $