Our Spotlight is on Dr. Cris Cecka, a research scientist and lecturer in the new Institute for Applied Computational Science (IACS) at Harvard University. Harvard has been a CUDA Center of Excellence since 2009, led by Dr. Hanspeter Pfister, IACS Director. Cris is currently also performing research with the Mathematics Department at the Massachusetts Institute of Technology. Previously, Cris was a graduate student in the Institute for Computational and Mathematical Engineering (ICME) at Stanford University with Prof. Eric Darve.
The following is an excerpt from our interview (read the complete Spotlight here).
NVIDIA: Cris, what are your primary research interests?
Cris: My research focuses on computational mathematics, particularly for interdisciplinary applications in science and engineering. In the past, I’ve used CUDA for non-linear PDEs (partial differential equations) and real-time computing with applications in simulation and virtual surgery.
More recently, I have become interested in mathematical and computational abstractions to produce efficient, library-quality scientific software. Specifically, I have focused on generalized n-body problems, including integral equation methods, particle methods, and structured dense matrices.
As part of my work, I’ve released several software libraries, including FMMTL to aid in the research, development, and use of kernel matrices and CrowdCL to aid in the use of GPU computing within a browser.
NVIDIA: Tell us more about FMMTL. Is it GPU-accelerated?
Cris: FMMTL is a research code that is exploring fast algorithms (like Treecode, FMM, H-matrix, and Butterfly) for kernel matrices and other structured dense matrices. Why structured? Well, plenty of algorithms exist for dense matrices, e.g. all of BLAS and LAPACK. These use values of the matrix to compute products, eigenvalues, factorizations, etc. But there are huge classes of problems where we never actually want to construct all of the elements of the matrix — generalized n-body problems — and can be accelerated either by compressing rows, columns, or blocks of the matrix or by avoiding computing elements of the matrix all-together.
By avoiding the computation of all of the elements or delaying the computation until the matrix element is requested, the amount of data required to define the matrix is reduced to O(N), which is great in terms of computational intensity! There is very little data to access and lots and lots of computation. Continue reading