CUDA Spotlight: GPU-Accelerated Shape Sensing

This week’s Spotlight is on Patrick Roye of Luna, Inc.

Patrick works on accelerating Luna’s processing algorithms using GPUs. He and a team of engineers and scientists are developing a prototype system that uses CUDA to calculate the shape of a fiber-optic sensor in real-time.

Luna’s shape-sensing systems, which are currently in development, will be used to guide the next generation of medical robotic systems safely through a patient’s body.

Patrick Roye Portrait

NVIDIA: Patrick, tell us about your work at Luna.
Patrick: I’ve been a software engineer at Luna for just over five years. I work in Luna’s Lightwave Division, which develops and manufactures products for fiber-optic testing, strain and temperature sensing, and shape sensing. For the last year and a half, I’ve helped develop high-speed versions of our products that utilize NVIDIA GPUs to accelerate data processing.

NVIDIA: What are some applications of Luna’s technology?
Patrick: One of our key target markets is healthcare, including the area of Minimally Invasive Surgery (MIS). Luna’s shape-sensing systems, which are currently in development, calculate the shape of fiber-optic sensors in real-time.

NVIDIA: Why did you choose to work with GPUs?
Patrick: The processing for our shape-sensing technology was initially developed on FPGAs, which allowed us to transfer and process data at extremely low latencies, on the order of milliseconds. But when higher levels of accuracy required us to increase the number of points and complexity of our algorithms, the FPGAs we were using were no longer a viable option.

Fortunately, at the same time the door closed on our FPGAs, NVIDIA opened a window with the announcement of GPUDirect RDMA. Since we had used CUDA a year earlier to accelerate our strain and temperature sensing calculations, we already had an idea of the advantages of GPU-accelerated processing. With GPUDirect RDMA and CUDA-accelerated processing, we determined that we could perform data acquisition and minimal processing on an FPGA, transfer our data directly to the GPU for processing and then transfer the results back to the FPGA fast enough to meet our real-time requirements.

NVIDIA: What approaches did you find useful for developing on the CUDA platform?
Patrick: The algorithm requires over 100 kernels, operating on tens-of-thousands of data points. All kernels must complete before the next data set arrives from the FPGA, so every kernel had to be optimized to run as fast as possible. Here are a few tips I learned from this extreme optimization process. Continue reading