CUDA Spotlight: GPU-Accelerated High Energy Physics

This week’s Spotlight is on Valerie Halyo, assistant professor of physics at Princeton University.

Researchers in the field of high energy physics, such as Valerie, are exploring the most fundamental questions about the nature of the universe, looking for the elementary particles that constitute matter and its interactions.

One of Valerie’s goals is to extend the physics accessible at the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland to include a larger phase space of the topologies that include long-lived particles. This research goes hand in hand with new ideas related to enhancing the “trigger” performance at the LHC.

(In particle physics, a trigger is a system that rapidly decides which events in a particle detector to keep when only a small fraction of the total can be recorded.)

Large Hadron Collider
Large Hadron Collider (courtesy CERN)

Read Valerie’s full Spotlight here. Here is an excerpt:


NVIDIA: How can GPUs accelerate research in this field?
Valerie: The Compact Muon Solenoid (CMS), one of the general-purpose detectors at the LHC, features a two-level trigger system to reduce the 40 MHz beam crossing data rate to approximately 100 Hz.

(In particle physics, a trigger is a system that rapidly decides which events in a particle detector to keep when only a small fraction of the total can be recorded.)

The Level-1 trigger is based on custom hardware and designed to reduce the rate to about 100 kHz, corresponding to 100 GB/s, assuming an average event size of 1 MB.

The High Level Trigger (HLT) is purely software-based and must achieve the remaining rate reduction by executing sophisticated offline-quality algorithms. GPUs can easily be integrated in the HLT server farm and allow, for example, simultaneous processing of all the data recorded by the silicon tracker as particles traverse the tracker system.

The GPUs will process the data at an input rate of 100 kHz and output all the necessary information about the speed and direction of the particles for up to 5000 particles in less than 60 msec. This is more than expected even at design luminosity. This will allow for the first time not only the identification of particles emanating from the interaction point but also reconstruction of the trajectories for long-lived particles. It will enhance and extend the physics reach by improving the set of events selected and recorded by the trigger.

While both CPUs and GPUs are parallel processors, CPUs are more general purpose and designed to handle a wide range of workloads like running an operating system, web browsers, word processors, etc. GPUs are massively parallel processors that are more specialized, and hence efficient, for compute-intensive workloads.

NVIDIA: What approaches did you find the most useful for CUDA development? 
Valerie: Our work includes a combination of fundamental algorithm development and GPU implementation for high performance. Algorithm prototypes were often tested using sequential implementations since that allowed for quick testing of new ideas before spending the time required to optimize the GPU implementations.

In terms of GPU performance it’s a continual process of implementing, understanding the performance in terms of the capability of the hardware, and then refactoring to alleviate bottlenecks and maximize performance. Understanding the performance and then refactoring the code depends on knowledge of the hardware and programming model, so documentation about details of the architecture and tools like the Profiler are very useful.

As a specific example, the presentation at GTC 2013 by Lars Nyland and Stephen Jones on atomic memory operations was especially useful in understanding the performance of some of our code. It was a particularly good session and I’d recommend watching it online if you weren’t able to attend.

NVIDIA: Beyond triggering, how do you envision GPUs helping to process or analyze LHC data?

Valerie: Reconstruction of the particle trajectories is essential both during online data taking and offline while we attempt to reprocess data that was already archived and stored for further analysis, so faster reconstructions improve the turnaround of new results.

In addition, experiments have to deal with massive production of Monte Carlo data each year. Billions of events have to be generated to match the running conditions of the collision and detector while data was taken. Improvements in the speed of reconstruction allow higher production rates and better efficiency of the computing resources, while also saving money.

NVIDIA: What advice would you offer others in your field looking at CUDA?

Valerie: Even if you aren’t directly developing and optimizing code it’s clear that the future is massively parallel processors, and having a basic understanding of processor architecture and parallel computing is going to be important.

A great algorithm design isn’t very useful if it can’t be implemented in an efficient way on modern parallel processors.

Read the full interview. Read more CUDA Spotlights.


About Calisa Cole

Calisa Cole
Calisa joined NVIDIA in 2003 and currently focuses on marketing related to CUDA, NVIDIA’s parallel computing architecture. Previously she ran Cole Communications, a PR agency for high-tech startups. She majored in Russian Studies at Wellesley and earned an MA in Communication from Stanford. Calisa is married and the mother of three boys. Her favorite non-work activities are fiction writing and playing fast games of online scrabble.