Dear co-authors, I am sending you the full draft of the paper for MICRO-32. I hope you can read it on the airplane to Atlanta and have comments for me soon. The main thing missing is getting the sustain performance numbers for the media kernels. Here are some notes about the current draft: * Abstract - I will write it last * Architecture: - Should I use this block diagram or one that is closer to the chip floorplan? - I am not mentioning the scalar register file in the vector unit. It would take too much space to explain why we have it... * Memory system: - I speculate the we will get a 20x20cm die to play with - I speculate the the IBM micro will work with 5ns page mode accesses - I do not mention that the macro is asynchronous and that we need a wrapper around it to make it look synchronous * Performance: - the goal is to have at least one kernel per data type supported - If I get matrix multiplication for all data types, I will probably add some figure with peak and sustained performance as a function of data-type width - I will try to get performance numbers for these kernels for other architectures, especially % of their peak rate being achieved. I can probably do this for MMX, not sure for the rest.