Discussion with Yatish, Norm and Yury 1. We want to port SCORE to millennium clusters. 2. What work needs to be done? a. Modify tdfc back-end to emit "simplified" c 1. rebuild streams a. learn about shared memory b. message passing and ???? 2. change the pagestep to be called continuously until done b. SCORE scheduler itself: 1. remove the simulator 2. rewrite the interface between the scheduler and the simulator a. what facilities to "pin" a process to a cluster node are available? b. are there ways to check point and move the process? GRANULARITY? c. can we control our priority? d. can we check the load any system? Do we engineer a policy saying that if a node is more than X busy, then we think that it does not exist. 3. The role of SCORE scheduler a. Maps the graph nodes onto computers in the Mill cluster. 1. Are we doing 1-to-1 mapping? streams to be MP only. 2. clustering: many page-on-1 cpu? two levels of communication: a. local comm: shared mem b. global: mp or ??? How many pages should we schedule on 1 node (this depends on #CPU)? 3. Do we move nodes around or are they assigned to one CPU for their lifetime (relocate) b. Instead of using CMBs to buffer intermediate results, use stream buffers (blocking reads) c. Temporal partitioning: 1. Easier: Cut the graph into number of partitions == number of available mill nodes. Let OS sched take care of scheduling within the node. Use non-blocking writes to permit overlapping of compute and communication. 2. Partition the graph into "schedulable" subgraphs (each subgraph has equal or fewer pages than available resource). Schedule each subgraph spatially and time-mux them. 3. Understand the limitations in parallelism between (1) and (2) 3. Applications: a. we have and address the data set size 1. jpeg codecs 2. wavelet codecs 3. mpeg encoder 4. OTHER apps???? a. 3d rendering b. do we have uni-processor implementations? c. multi-processor?