Vasily Volkov
| E-mail: | | volkov@cs.berkeley.edu |
| Phone: | | (510) 642-3979 |
| Office: | | 447 Soda Hall |
I am a fifth year doctoral student in Computer Science interested in
high performance computing and numerical linear algebra.
My advisor is Prof. James Demmel.
I have received 2008-2009 NVIDIA Fellowship.
- Volkov, V.
120 GFLOPS in matrix-matrix multiply using DirectX 9.0,
Technical Memo, January 31, 2007.
- Volkov, V., and Demmel, J. W.
Using GPUs to accelerate the bisection algorithm
for finding eigenvalues of symmetric tridiagonal matrices,
Technical Report No. UCB/EECS-2007-179, EECS Department,
University of California, Berkeley, December 29, 2007.
(Also LAPACK Working Note 197.)
- Volkov, V., and Demmel, J. W.
Using GPUs to accelerate linear algebra routines,
Poster at PAR Lab Winter Retreat, January 9, 2008.
- Volkov, V., and Demmel, J. W. LU, QR and Cholesky factorizations using vector capabilities of GPUs, Technical Report No. UCB/EECS-2008-49, EECS Department,University of California, Berkeley, May 13, 2008. (Also LAPACK Working Note 202.)
- Garland, M., LeGrand, S., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., Philipps, E., Zhang, Y., and Volkov, V. Parallel computing experiences with CUDA, IEEE Micro 28, 4, 13–27.
- Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., and Yelick, K. Stencil computation optimization and autotuning on state-of-the-art multicore architectures, SC08.
- Volkov, V., and Demmel, J. W. Benchmarking GPUs to tune dense linear algebra, SC08. (Best Student Paper award.)
Slides for my SC08 talk complement the paper and are focused on performance programming guidelines for GPUs.
Matrix factorization codes are now available.
My matrix-matrix multiply code is in NVIDIA CUBLAS 2.0.
I have a similar code for the symmetric rank-k update (ssyrk).
My FFT prototype was 3x faster than NVIDIA CUFFT.
The ideas behind it are used in OpenCL FFT.
(See also
slide 38 in David Luebke, "The Past, Present, and Future of GPU Computing".)
I collected performance of 25 implementations of matrix multiply
in CS267 where I served as TA.