Notes and to do list on Dataflow for Parallel Systems

(Collection of notes and materials to help substantiate SCORE on Millennium)

Eylon's MICRO submission, why are streams good?

   http://www.cs.berkeley.edu/~eylon/brass/extended-abstract.pdf
  1. Streams:
    1. expose inter-thread communication dependencies   (data-flow), allowing scheduling to be efficient,
    2. allows to  batch data (amortize set up costs),
    3. streams can be exposed as a   feature of the architecture like memory interface, which guarantees safety and protection of threads execution and memory  space.
  2. Streams are the ONLY mechanism that threads (pages/operators) use to communicate with each other.
  3. Streams provide blocking reads and non-blocking writes. The first guarantee determinism: independence of program execution to scheduling and communication timing.
  4. Other models make scheduling difficult with obscure communication patterns (memory aliasing) and explicit synchronization (semaphores). This limits applications scalability on larger systems and forward compatibility.
  5. Batch processing tokens is useful to amortize the runtime context switch time and set up cost for streams. This is similar to blocking of data to improve cache locality, but the size of a block is determined at runtime based on available buffer sizes, etc. This late binding of time further contributes to the forward   compatibility and scaling on larger hardware (contrast with non-streaming models that fix block size at compile time).
  6. Common streaming protocol allows a page to be completely oblivious whether the page is connected to another page, segment, or CPU.

River projects stuff.

  1. Cluster I/O with River
  2. Performance heterogeneity between nodes and devices in a clustered system environment limits application's ability to perform at close to ideal speeds. Rather than attempting to eliminate heterogeneity, use River with
    1. Distributed Queue (DQ) balances work across consumer of the system
    2. Graduated Declustering (GD) dynamically adjusts the load generated by the producers.
  3.    River project does not attempt to perform meticulous allocation of resources nor avoid perturbations that occur in multi-node systems. The attempt is to provide the system that can "handle it."
  4. Blocking receives are used: Active Messages find a good use since they have blocking receives, avoiding polling.
  5. "the programmer needs only to specify which nodes to place various modules upon" to instantiate the graph on a more than 1 node (page 5 lc).
  6. Through graduated declustering and mirroring, the applications gain robustness to read perturbations.
  7. Two flavors of queues (like streams) are used: (1) push-based for small chunks of data, where the producer send new messages to the consumers with below the threshold number of outstanding send requests, (2) pull-based, where consumers query the producers for data to achieve balance loading.
  8. Lessons learned from their example of River Sort: application writer does not need to write code to manage IO, but all that needs to be written are sort and partition modules. Scaling to a parallel sort is the matter of constructing flow graph.
  9. Other parallel programming environments (Cilk, LazyThreads, Multipol) balance load across consumers in order to allow highly-irregular, fine-grained parallel application. The main difference between River and these systems is the granularity of communication, because River pushes data between node in large     chunks.

  10. NOTE: SCORE applications communication is fine-grained, but application flow (communication pattern) is very regular.

Andrea C. Arpaci-Dusseau's implicit coscheduling paper and other papers

   http://now.CS.Berkeley.EDU/Implicit/
   http://www.cs.wisc.edu/~dusseau/
 

Patrick Sobalvarro's thesis and publications:

   http://www.psg.lcs.mit.edu/~pgs/

Look into terms

Edward Lee's work on multi-processors.

http://ptolemy.eecs.berkeley.edu/~eal/
  1.  Understand the domain (SDF, BDF, further???)
  2.  Ptolemy: what does it do that we do/don't?

Who else does scheduling data-flow on multiprocessors?

  1. Fine grain -- (e.g. Edward Lee's works for embedded multiprocs)
  2. Coarse grain dataflow

Dataflow languages

  1.  ID @MIT Arvind
  2.  parallel Haskell   Arvind @MIT, Hudak @Yale
  3.  Earth (still alive ac Yelick) Gao Guany @ U of Delaware
  4.  SISAL - Los Alamos

Think of applications

 

 


$Id: data_flow_notes.html,v 1.3 2001/10/24 04:25:26 yurym Exp $