Rajesh Nishtala


575 Soda Hall, UC Berkeley
Berkeley, CA 94720
510-642-9483
rajeshn at eecs dot berkeley dot edu

Overview

I am a fifth year graduate student in Computer Science at UC Berkeley. My areas of emphasis include computer systems and parallel computing. I am working in the BeBop and Unified Parallel C groups under Professor Katherine Yelick and Professor James Demmel.

Publications

  • Paper: Performance without Pain = Productivity, Data layouts and Collectives in UPC
    (Principles and Practices of Parallel Programming (PPoPP) 2008 , Salt Lake City, USA, February 2008)
    Rajesh Nishtala, George Almasi, Calin Cascaval
    Paper (External Website): PDF (320k)
    Talk Slides: PDF (5.8MB)

  • Poster: Optimized Collectives for PGAS Languages with One-Sided Communication
    (Supercomputing, Tampa Bay, USA, November 2006)
    Dan O. Bonachea, Rajesh Nishtala, Paul Hargrove, Mike Welcome, Katherine Yelick
    PDF (386kB)

  • Talk: Efficient Point-to-point Synchronization in UPC
    (Partitioned Global Address Space Programming Models, Washington DC, USA, October 2006)
    Dan Bonachea, Rajesh Nishtala, Paul Hargrove, Katherine Yelick
    Abstract (PDF 37kB)
    Talk Slides: PPT (2.9MB) PDF (945kb)

  • Masters Report: Architectural Probes for Measuring Communication Overlap Potential
    (submitted May 19th, 2006 for Master of Science Degree)
    Rajesh Nishtala
    PDF (0.5MB)

  • Paper: Optimizing Bandwidth Limited Problems Using One-Sided Communication and Overlap
    ( International Parallel and Distributed Processing Symposium (IPDPS) 2006 , Rhodes, Greece, April 2006)
    Christian Bell, Dan Bonachaea, Rajesh Nishtala, Katherine Yelick
    PDF (300kB)
    Talk Slides (1MB)

  • Poster: The Performance and Productivity Benefits of Global Address Space Languages
    (Supercomputing, Seattle, USA, November 2005)
    Dan O. Bonachea, Christian Bell, Rajesh Nishtala, Kaushik Datta, Parry Husbands, Paul Hargrove, Katherine Yelick
    PDF (2.9MB)

  • Poster: Automatic Tuning of Collective Communications in MPI
    (SIAM Conference on Parallel Processing for Scientific Computing, San Francisco, USA, February 2004)
    Rajesh Nishtala, Kushal Chakrabarti, Neil Patel, Kaushal Sanghavi, James Demmel, Katherine Yelick, and Eric Brewer
      PowerPoint (6MB)

  • Journal Paper: When Cache Blocking Sparse Matrix Vector Multiply Works and Why
    (Applicable Algebra in Engineering, Communication and Computing, March 2007)
    Rajesh Nishtala, Richard W. Vuduc, James W. Demmel, Katherine Yelick
    Journal Website

  • Tech Report: Performance Modeling and Analysis of Cache Blocking in Sparse Matrix Vector Multiply
    (UCB/CSD-04-1335, June, 2004.)
    Rajesh Nishtala, Richard W. Vuduc, James W. Demmel, Katherine A. Yelick
    PDF (~8MB)

  • Talk: When Cache Blocking Sparse Matrix Multiply Works and Why
    (PARA'04 Workshop on State-of-the-art in Scientific Computing, Copenhagen, Denmark, June 2004, to appear)
    Rajesh Nishtala, Richard Vuduc, James Demmel, Katherine Yelick.
    1 Page Abstract: PDF (61K)
    7 Page Abstract: PDF (113K)
    Talk Slides: PPT (3MB)

  • Paper: Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply
    (Proceedings of the IEEE/ACM Conference on Supercomputing, 2002, Baltimore, MD, USA, November 2002.)
    Richard Vuduc, James W. Demmel, Katherine A. Yelick, Shoaib Kamil, Rajesh Nishtala, Benjamin Lee.
    PDF (630k)

  • Talk: Automatic Performance Tuning and Analysis of Sparse Triangular Solve
    (ICS 2002: Workshop on Performance Optimization via High-Level Languages and Libraries, New York, NY, USA, June 2002.)
    Richard Vuduc, Shoaib Kamil, Jen Hsu, Rajesh Nishtala, James W. Demmel, Katherine A. Yelick.
    PDF (597k)


  • Teaching:

  • CS162 Operating Systems and Systems Programming (Fall 2005, Winner of Outstanding GSI Award)
  • CS267 Parallel Computing (Spring 2006)
  • Resume:

    PDF
    TXT

    Class Projects

  • CS252 Graduate Computer Architecture. Outlier Detection In Sensor Networks
  • Final Report (PDF)
  • CS262A Computer Systems
  • Automatic Tuning of MPI Collective Communications (PDF)
  • CS262B Computer Systems
  • Firehose: An Algorithm for Distributed Page Registration on Clusters of SMPs (PDF)
  • CS267: Parallel Computing
  • UPC Implementation of Parallel Sparse Triangular Solve and NAS FT (PDF)
  • CS281A: Statistical Learning
  • When to Cache Block Sparse Matrix Vector Multiplication: A statistical learning approach. (PDF)