1. How do we measure the amount of time we stall? - number of time that had to wait on semaphore - time??? 2. What to collect a. number of tokens that passed through each queue b. 3. How to schedule a. nodes that communicate most frequently should be grouped together b. large mismatch rates??? --------------- 1. Performance Profile: a. some sort of compute to communication time ratio per operator b. come up with toy examples to understand the effect of rate mismatches, communication time, blocking, etc. c. how effient is our IO subsystem, that includes not just the thread, but also stream implementation, synchronization mechanisms. d. how expensive are mutexes and condition variables. e. how big do queues get? f. how many iterations of meat_and_potatos are useless, g. how many iterations do how much work. h. how much total CPU time was devoted to computation per computer i. the total number that ALL operators spend computing ideally always stays the same. So when we split it across many processors, it should decrease at some rate. But it will actually decrease at slower rate. That permits us to compute the fraction that processor time is of total makespan time: May be we want to know the makespen of each rank. 2. Collect stats for scheduling a. get the number of tokens that flows through each stream? b. record how utilized each machine was??? c. correlate that with the schedule. 3. Write scheduling (clustering) algorithm a. clustering through shadow graphs b. 4. Produce an benchmarking infrastructure to get data for curves for at least two apps.