Learn GPU Computing with Hands-On Labs at GTC 2015

Every year NVIDIA’s GPU Technology Conference (GTC) gets bigger and better. One of the aims of GTC is to give developers, scientists, and practitioners opportunities to learn with hands-on labs how to use accelerated computing in their work. This year we are nearly doubling the amount of hands-on training provided from last year, with almost 2,400 lab hours available to GTC attendees!

We have two types of training this year at GTC: instructor-led labs and self-paced labs. And to help you keep up with one of the hottest trends in computing, this year we’re featuring a Deep Learning training track. Keep reading for details. If you haven’t registered for GTC yet this year, keep reading for a discount code.

Deep Learning Track

There is an explosion of Deep Learning topics at GTC, and it’s not limited to the keynotes, talks and tutorial sessions. We’ll feature at least six hands-on labs related to accelerating facets of Deep Learning on GPUs. From an introduction to Deep Learning on GPUs to cutting-edge techniques and tools, there will be something for everyone. Be sure to get to these labs early to get yourself a seat! Here are a few of the labs available in this track:

  • Introduction to Machine Learning with GPUs: Handwritten digit classification (S5674)
  • DIY Deep Learning for Vision with Caffe (S5647)
  • Applied Deep Learning for Vision, Natural Language and Audio with Torch7 (S5574)
  • Deep Learning with the Theano Python Library (S5732)
  • Deep Belief Networks Using ArrayFire (S5722)
  • Accelerate a Machine Learning C++ example with Thrust (S5822)

Instructor-led Labs

IMAG0568Just like GTC last year, there will be twenty hands-on instructor-led labs. These are 80-minute labs led by an expert on the topic. Continue reading


Register for GTC 2014 Now and Save 40%!

It’s that time of year again!  Here at NVIDIA we’re hard at work getting ready for the 2014 GPU Technology Conference, the world’s most important GPU developer conference. Taking place in the heart of Silicon Valley, GTC offers unmatched opportunities to learn how to harness the latest GPU technology including 500 sessions, hands-on labs and tutorials, technology demos, and face-to-face interaction with industry luminaries and NVIDIA technologists.

Come to the epicenter of computing technology March 24-27, and see how your peers are using GPUs to accelerate impactful results in various disciplines of scientific and computational research. Register for GTC now, because the Early Bird discount for GTC registration ends in one week on Wednesday, January 29th. The Early Bird discount is 25% on a full-conference registration, and to sweeten the deal I can offer Parallel Forall readers an extra 20% off using the code GM20PFB. That gets you four days of complete access to GTC for just $720, or $360 for academic and government employees. Don’t miss it, register now!

Here are a few talks to give you an idea of the breadth and quality of talks you will see at GTC: Continue reading

Join Me and Other NVIDIA Experts at the GPU Technology Conference

GTC12_05.14_General_034NVIDIA’s GPU Technology Conference (GTC) 2013, scheduled for March 18-21, is the premier event for accelerated computing. This will be the fourth GTC and it just keeps getting better. If you haven’t been to GTC before you won’t be disappointed. Thousands of developers and research scientists from over 40 countries will converge on the San Jose Convention Center in California to talk shop across a wide-array of computing and research disciplines.

I will be giving a talk on future directions for CUDA that I hope you will attend, as well as a tutorial introduction to CUDA C/C++.

GTC will have many sessions on a wide variety of topics so I’m sure you will find much of interest. Here’s a small sample of session topics from GTC 2013.

This year, GTC is expanding to include more sessions on the use of GPUs in mobile computing, cloud graphics, game development, media & entertainment, and manufacturing.

In addition to the sessions, we have a full day of pre-conference tutorials taught by NVIDIA research and engineering staff who are pioneering the next generation of GPU computing solutions. There is no additional cost for the tutorials when you buy a Full Conference Pass.

Beyond the formal curriculum, GTC provides a wealth of great networking events and opportunities. GTC attracts some of the top people in their fields, all working with GPUs and related technology, so you are certain to have lots of valuable conversations and meet interesting people—I know I always do.

Register before January 20 to take full advantage of the early bird discount, and readers of the Parallel Forall blog can use my personal discount code GMNVE229598OX8J to get another 10% off the published price. If you’re an academic or government employee, you’re entitled to our special academic/government pricing.

Finally, if you’d like to share your research at GTC, there is still time to submit a research poster proposal. The Call for Posters remains open until February 4, 2013. For detailed submission instructions and to read about the benefits of presenting, please visit the Call for Posters page.

I look forward to seeing you in San Jose!

My Favorites from GTC 2012

I had a great time at GTC 2012; there was incredible energy and excitement from everyone I talked to. I think the energy level had a lot to do with all of the exciting technology announcements from NVIDIA (KeplerGK110 architectureCUDA 5NSight Eclipse EditionVGX, and GeForce Grid, to name a few!), but I think that having heaps of great content from outside contributors as well as NVIDIA was crucial to making GTC 2012 a great conference.

I was very busy at GTC and so I didn’t get to attend many of the talks that I wanted to see, and I’m sure the same is true for most attendees. Thankfully the GTC team created GTC On Demand to solve this problem.  With GTC On Demand you can watch almost any GTC talk, free, on the web. In the past month or so since GTC, I’ve been catching up on some of the great talks that I missed. In this post I want to share with you my favorites from GTC, and I hope you will share your favorite GTC talks in the comments.

Continue reading

In the Trenches at GTC: Faster Finite Elements for Wave Propagation

By Kenneth A. Lloyd (GTC 2012 Guest Blogger)


Geophysical wave propagation is really interesting because you can’t actually see the phenomenon, and you can only feel a small part of it (that is, unless you are unlucky and find yourself in the path of an earthquake). The only way we have of understanding the causes and effects of seismic activity is to model it, compare it with a lot of data, and visualize the mathematical model in a computer.

The problem, of course, is that the models, the data and the translation into some type of visualization are “computationally expensive”, meaning that it requires substantial computing power to crunch the numbers. Since seismic activity is not a planned event, we simply can’t wait for hours and weeks for the simulation to run to predict the far-reaching and potentially disastrous global effects.

In his GTC session on Thursday, May 17, Max Reitmann of the Institute for Computational Science in Lugano, Switzerland, showed us how seismic simulations are being done more quickly and accurately using CUDA and massively parallel processing. Reitmann detailed a Fortran, C/CUDA and MPI approach to a finite element implementation of wave propagation in geophysics. Specifically, the seismic propagation of earthquake perturbation. Reitmann employed an MPP CUDA graph coloring refinement of an existing, open source, seismic finite element analysis application—reducing the computational time from 75 hours to one hour.

There are two available approaches, depending upon the phenomena to be modeled: a seismic approach for earthquakes (which are non-experimental) and a tomographic approach (small perturbation models). The model and simulation yields a coarse-grained understanding of either global or local seismic phenomena in an economically reasonable time.

You can watch the slidecast of Max’s presentation here.

kenneth_lloydAbout our Guest Blogger:

Kenneth Lloyd is the Director of System Sciences at Watt Systems and a co-organizer of the New Mexico GPU Meetup Group.

In the Trenches at GTC: Inside Kepler

By Tomasz Bednarz (GTC 2012 Guest Blogger)

I had been eagerly anticipating the “Inside Kepler” session since GTC 2012 opened. On Wednesday arvo, May 16th, two NVIDIA blokes, Stephen Jones (CUDA Model Lead) and Lars Nyland (Senior Architect), warmly introduced the new Kepler GK110 GPU, proudly announcing that it is all about “performance, efficiency and programmability.”

Lars and Stephen revealed that GK110 has 7.1B transistors (wow!), 15 SMX units, >1 TFLOP fp64, 1.5MB L2 cache, 384-bit GDDR5 and PCI-Express Gen 3. NVIDIA looked high and low to reduce power consumption and increase performance, and after many months of tough design and testing in their labs, the NVIDIA mates emerged with an awesome GPU that greatly exceeds Fermi’s compute horsepower, while consuming less power and generating less heat. Corker of a GPU! Continue reading

In the Trenches at GTC: Swift: A GPU-based Smith-Waterman Sequence Alignment Program

By Jike Chong, Parasians (GTC 2012 Guest Blogger)

This week at GTC, Pankaj Gupta, a bioinformatics application developer at St. Jude Children’s Research Hospital, presented his work in the area of sequence alignment. Sequence alignment is an important component of bioinformatics that is crucial for the vision of personalized medicine. Sequence alignment matches new sequences to known sequences, while detecting point mutations, insertion mutations or deletion mutations for important initiatives such as finding better cures.

Current sequencing machines produce reads of gene samples through optical or electrical means. The read signals are converted into sequences of symbols called “bases” through machine-specific signal processing analysis, otherwise known as “primary analysis”. Machine agnostic sequence alignment algorithms are then applied to interpret the read sequences in the context of known sequences. Continue reading

In the Trenches at GTC: Scaling Applications to a Thousand GPUs and Beyond

By Adnan Boz (GTC 2012 Guest Blogger)

Question: Why would you need 50 petaflops of horsepower and a 500,000 scalar processor capable supercomputer?

Answer: You need to simulate dynamics of complex fluid systems!

On Day 3 of GTC, HPC architect and Ogden prize winner Dr. Alan Gray from the University of Edinburgh described his use of C, MPI and CUDA on an NVIDIA Tesla -powered Cray XK6 hybrid supercomputer to run massively parallel Lattice-Boltzmann methods.

“Simulating simple fluids like water requires massive amount of computer power, but simulating complex fluids like mixtures, surfactants, liquid crystals or particle suspensions, requires much more than that,” commented Dr. Gray. Continue reading

In the Trenches at GTC: CUDA 5 and Beyond

By Michael Wang, The University Of Melbourne, Australia (GTC 2012 Guest Blogger)

Following up the opening keynote by NVIDIA CEO and co-founder Jen Hsun-Huang, Mark Harris took the very same stage (albeit with a more intimate crowd) for his afternoon session entitled CUDA 5 And Beyond.

Mark walked us through the major features of the upcoming CUDA 5.0 release, and took some time to share a vision for the future of GPU and massively parallel programming generally.

The four aspects of CUDA 5 that Mark highlighted were:

  1. Dynamic parallelism
  2. GPU object linking
  3. NVIDIA Nsight, Eclipse Edition
  4. GPUDirect for clusters (RDMA) Continue reading

In the Trenches at GTC: Programming GPUs with OpenACC

By Adnan Boz (GTC 2012 Guest Blogger)

It’s my first day at the GPU Technology Conference and I’ve already had the opportunity to meet gurus like Mark Harris (Chief Technologist, GPU Computing, NVIDIA, and founder of and learn about the latest advancements in the GPU and HPC arena from people like NVIDIA’s Will Ramey and Duncan Poole.

One of the hot topics so far is OpenACC, an open GPU directives standard that makes GPU programming straightforward and portable across parallel and multi-core processors (see: Continue reading