Hello world! I am a Ph.D. student in scientific computing under the guidance of professors Katherine Yelick and James Demmel. I am also a member of the BeBOP group in the ParLab as well as the Future Technologies Group at Lawrence Berkeley National Laboratory. My interests include high performance computing, auto-tuning, and parallel (multicore) programming.
I will be joining Reservoir Labs after graduation.
My current thesis research is based on tuning stencil codes. Essentially, these codes perform nearest neighbor computations on structured grids. They are commonly used in solving partial differential equations (PDE's), which arise in fields as diverse as heat diffusion and electromagnetics. My work has focused on achieving good performance from stencil codes across a diverse range of multicore processors. This work was preceded by Sam Williams's work in tuning sparse matrix-vector multiply (SpMV) and a structured grid application (LBMHD).
In order to get good performance across several varied architectures, I created an auto-tuning framework. First, I identified a set of domain-specific optimizations, including: multi-level domain decomposition, software prefetching, padding, inner loop optimizations, and ISA-specific transformations. I then generated many different code variants by writing stencil code generators in Perl. This was more or less a proof-of-concept; it is being improved upon by Shoaib Kamil and Cy Chan, who are using functional programming to make the code generation more intelligent and robust.
The last step was to identify the best set of compile-time and run-time parameters for a given architecture. This is a difficult problem, since the parameter space is so large that an exhaustive search is infeasible. Moreover, the search space is usually not smooth. Archana Ganapathi and I are looking into more intelligent ways of traversing this space, including machine learning.
Some of this auto-tuning work was used as a building block for the Green Flash project, which is covered here.
K. Datta, "Auto-tuning Stencil Codes for Cache-Based Multicore Platformsm", Ph.D. Dissertation/Technical Report, University of California, Berkeley, December 2009.
Abstract (External Link)
[pdf] (3.1 MB)
Slides: [pptx] (4.1 MB) |
[pdf] (3.3 MB- Warning: Some of the animations are destroyed)
A. Ganapathi, K. Datta, A. Fox, D. Patterson, "A Case for Machine Learning to Optimize Multicore Performance", First USENIX Workshop on Hot Topics in Parallelism (HotPar '09), Berkeley, CA, March 30-31, 2009.
[pdf] (324 KB)
Slides: [ppt] (17.9 MB) |
[pdf] (3.6 MB)
K. Datta, S. Kamil, S. Williams, L. Oliker, J. Shalf, K. Yelick, "Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors", SIAM Review (SIREV), Volume 51, Issue 1, pp. 129-159, 2009.
[pdf] (3.3 MB)
doi: 10.1137/070693199 (External Link)
K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, K. Yelick, "Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures",
Supercomputing 2008 (SC08), Austin, TX, November 18-20, 2008.
Abstract (External Link)
[pdf] (600 KB)
Slides: [ppt] (12.9 MB) |
[pdf] (14.8 MB)
S. Williams, K. Datta, J. Carter, L. Oliker, J. Shalf, K. Yelick, D. Bailey, "PERI: Auto-tuning Memory Intensive Kernels for Multicore", SciDAC PI Conference, Journal of Physics: Conference Series: 125 012001, 2008.
Abstract (External Link)
[pdf] (1.2 MB)
S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, K. Yelick, "Implicit and Explicit Optimizations for Stencil Computations", Memory Systems Performance and Correctness (MSPC '06), San Jose, CA, October 22, 2006.
[pdf] (604 KB)
Slides: [pdf] (3.2 MB)
K. Datta, "Stencil Computation Auto-tuning on Modern Multicore Architectures", Microsoft Numerical Libraries Group/Research, Redmond, WA, May 22, 2009.
[pptx] (12.1 MB) |
[pdf] (11.7 MB)
K. Datta (on behalf of Bebop group), "Tuning in ParLab", ParLab Vertical Integration Meeting, Berkeley, CA, March 4, 2009.
[ppt] (1.8 MB) |
[pdf] (1.5 MB)
A. Ganapathi, K. Datta, A. Fox, M. Jordan, D. Patterson, "Auto-Tuning Stencil Codes Using Machine Learning", ParLab Winter Retreat, Tahoe City, CA, January 7, 2009.
[ppt] (2.1 MB) |
[pdf] (5.3 MB)
S. Kamil, K. Datta, "Using Auto-tuning to Generate Optimal Code for Multicore", ParLab Grand Opening Discussion Slides, Berkeley, CA, December 1, 2008.
[ppt] (212 KB) |
[pdf] (68 KB)
S. Williams, K. Datta, "Autotuning Memory-Intensive Kernels for Multicore", Workshop on Programming Massively Parallel Processors (PMPP), Urbana, Illinois, July 10, 2008.
[pdf] (4.6 MB)
K. Datta, S. Williams, V. Volkov, M. Murphy, "Auto-Tuning of Stencil Codes", ParLab Summer Retreat, Santa Cruz, CA, June 5, 2008.
[ppt] (9.1 MB) |
[pdf] (1.7 MB)
K. Datta, S. Williams, K. Yelick, J. Demmel, "Bandwidth Avoiding Stencil Computations", SIAM Conference on Parallel Processing for Scientific Computing (PP08), Atlanta, GA, March 13, 2008.
[ppt] (2.5 MB) |
[pdf] (3.1 MB)
K. Datta, S. Kamil, S. Williams, L. Oliker, J. Shalf, K. Yelick, "Tuning 3D Stencil Codes", Center for Scalable Application Development Software (CScADS) Workshop on Automatic Tuning for Petascale Systems, Snowbird, UT, July 11, 2007.
[pdf] (4.0 MB)
K. Datta, "Automatic Stencil Code Generation", Qualifying Exam, Berkeley, CA, March 2, 2007.
Note: Much of my original thesis proposal is now outdated due to the multicore revolution. Time makes fools of us all.
[ppt] (2.6 MB) |
[pdf] (4.5 MB)
Proposal: [pdf] (512 KB)
A. Ganapathi, K. Datta, A. Fox, D. Patterson, "Using Machine Learning to Auto-tune Multicore Architectures", Parlab Summer Retreat, Santa Cruz, CA, June 1-3, 2009.
[ppt] (6.9 MB) |
[pdf] (1.7 MB)
A. Ganapathi, K. Datta, A. Fox, D. Patterson, "Using Machine Learning to Auto-tune a Stencil Code on a Multicore Architecture", Third Workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML '08), San Diego, CA, December 11, 2008.
[ppt] (5.6 MB) |
[pdf] (1.3 MB)
K. Datta, A. Ganapathi, A. Fox, D. Patterson, "Machine Learning for Auto-tuning", ParLab Grand Opening, Berkeley, CA, December 1, 2008.
[ppt] (1.9 MB) |
[pdf] (2.1 MB)
K. Datta, S. Williams, V. Volkov, M. Murphy, "Autotuning Structured Grid Kernels", ParLab Summer Retreat, Santa Cruz, CA, June 4-6, 2008.
[pdf] (3.6 MB)
K. Datta, S. Williams, S. Kamil, "Autotuning Structured Grid Kernels", ParLab Winter Retreat, Tahoe City, CA, January 9-11, 2008.
[pdf] (1.8 MB)
Much of my Masters research had been writing and optimizing parallel benchmarks for the Titanium group. Titanium is a great language based on Java, but with support for parallel execution and multi-dimensional array manipulation. If you're interested, please download the compiler and try it out!
My work involved understanding three of the benchmarks in the NAS Parallel Benchmarks suite- Multigrid (MG), Fourier Transform (FT), and Conjugate Gradient (CG). I then rewrote these benchmarks in Titanium. By exploiting both the algorithms and Titanium's language features, the resultant code was concise, scalable, and fast.
While developing the Titanium FT benchmark, I also wrote the Titanium Complex number library.
K. Yelick, P. Hilfinger, S. Graham, D. Bonachea,
J. Su, A. Kamil, K. Datta, P. Colella, T. Wen,
"Parallel Languages and Compilers: Perspective From the Titanium Experience",
International Journal of High Performance Computing
Applications, August 2007; vol. 21: pp. 266-290.
[pdf] (1.1 MB)
K. Yelick, D. Bonachea, W. Chen, P. Colella, K. Datta, J. Duell, S. Graham, P. Hargrove, P. Hilfinger, P. Husbands, C. Iancu, A. Kamil, R. Nishtala, J. Su, M. Welcome, T. Wen, "Productivity and Performance Using Partitioned Global Address Space Languagues", Parallel Symbolic Computation (PASCO) 2007, London, Ontario, Canada, July 27-28, 2007.
[pdf] (312 KB)
K. Datta, "The NAS Parallel Benchmarks in Titanium", Masters/Technical Report, University of California, Berkeley, December 2005.
Abstract (External Link)
[pdf] (1.0 MB)
P. Hilfinger, D. Bonachea, K. Datta, D. Gay, S. Graham, B. Liblit, G. Pike, J. Su, and K. Yelick, "Titanium Language Reference Manual, version 2.19", UCB/EECS-2005-15, November 17, 2005.
Abstract (External Link)
[pdf] (576 KB)
K. Datta, D. Bonachea, and K. Yelick, "Titanium Performance and Potential: an NPB Experimental Study", The 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC '05), Hawthorne, NY, October 20-22, 2005.
[pdf] (192 KB)
Full Paper: [pdf] (340 KB)
Slides: [ppt] (1.5 MB) |
[pdf] (1.2 MB)
C. Bell, D. Bonachea, K. Datta, R. Nishtala, P. Hargrove, P. Husbands, and K. Yelick,
"The Performance and Productivity Benefits of Global Address Space Languages",
Supercomputing 2005 (SC05), Seattle, WA, November 12-18, 2005.
[pdf] (2.8 MB)
Titanium NAS Parallel Benchmarks
Code, Documentation, and Performance Results: [link]
CS 252: Computer Architecture
Final Project
Math 221: Advanced Matrix Computations
Math 224A: Mathematical Methods for the Physical Sciences
Math 228A: Numerical Solutions of ODE's
ODEPACK Solvers
Math 228B: Numerical Solutions of PDE's
BE 143: Computational Methods in Biology
Final Paper (.doc)
I served for three years as Treasurer of the Computer Science Graduate Student Association (CSGSA).
I also play basketball a few times a week and ultimate occasionally. If you'd like to join me, please drop me a line.
593 Soda Hall
Computer Science Division
UC Berkeley
Berkeley, CA 94720-1776
"I am convinced that He (God) does not play dice." -- Albert Einstein
"Life is like a box o' chocolates. You never know what you gonna get." -- Forrest Gump