Now that seg_test (initial) works, what is the plan?
  1. Make apps run (wavelet, wavelet_decode, jpeg_decode, jpeg_encode) DONE 2
  2. Look into single-processor application from JoeDONE
  3. Figure out how to increase the size of the dataset (i.e. feed not one but many images)
  4. Benchmark applications.
  5. Test segment_r segment DONE
  6. Write co-scheduling code for scheduler/placement
  7. Profiling execution statistics for the operators DONE
  8. Run tests to understand the bandwidth and latency we are working with
    1. Make a small two node application to study issues
    2. Run both nodes on the same rank? Time to run: bandwidth + latency + overhead
    3. Run both nodes on different ranks? Time to run: bandwidth + latency + overhead
    4. MPI_?Send which one to use
  9. Optimizations:
    1.   optimize memory allocation: preferably static allocation.   DONE
    2.   parametrize our IO subsystem to emit token with some blocking factor.
      1. mechanism to avoid batching for stream in a cycle
      2. periodic sweeps of queues guarantees  (1) no deadlock, (2) increases granularily of communication
    3.    ??
    ALL DONE
  10. Talk to Millennium people about the topology of the network. DONE
  11. How to measure?
    1. how much time is spent on computing?
    2. how much time is spent on communications?
    3. where does the latency come from?
  12. Artificial benchmarks.
    1. heavy on computation per unit of communication
    2. what is the granularity at each we move from being comm to  compute dominated?
  13. 13. Scientific applications
    1.     - ??

Outline:

  1. Introduction
    1. What is SCORE? Why is it good?
      1. solving challenges of parallel programming
        1. abstracting away synchronization
        2. exposing stream as the only communication primitive
        3. automatic co-scheduling and topology specific placement
        4. learn from past runs
    2. Motivation for SCORE as a parallel programming environment on a clusters of SMP
    3. What do we hope to accomplish?
      1. demonstrate possibility of SCORE (general purpose dataflow environment) on a large cluster of workstations.
      2. understand the issues involved in parallel (distributed) systems design:
        1. communication overhead
        2. overlapping communication and computation
        3. limitations of underlying platform (OS) as the foundation for our system
        4. ???
  2. Previous Work
    1. Dataflow is extensively used in DSP environment and Embedded System design as model of computation and communication
    2. River project
    3. Turnado
    4. Volcano
    5. Last four projects primarily were dealing with the environment where each workstation has its own storage, and thus the cluster possesses the aggregate of bandwidths of all disks. These projects were dealing with an application that were IO dominated from the beginning and benefitted tremendously from larger bandwidth.
  3. General intro to SCORE concepts.
    1. dataflow based Lee's process networks.
    2. operators fire on token presense + stream are FIFOs --> determinism
    3. semantic execution rules: blocking reads, non-blocking writes
    4. describe the language: TDF
  4. SCORE on millennnium
    1. describe the infracture
    2. describe the tdfc processes in general (no need for great detail)
    3. describe target
    4. communication mechanisms, parameterization (batching parameters?)
  5. Study of scheduling strategies
    1. naive
    2. topological (neighbors co-resident)
    3. feedback based -- optimized for communication frequency
  6. Application results
    1. Describe applications
    2. present results
  7. Conclusion
??? Where to insert granularity of nodes study?
    What else to talk about?