tesla_platform_thumb

12 Things You Should Know about the Tesla Accelerated Computing Platform

You may already know NVIDIA Tesla as a line of GPU accelerator boards optimized for high-performance, general-purpose computing. They are used for parallel scientific, engineering, and technical computing, and they are designed for deployment in supercomputers, clusters, and workstations. But it’s not just the GPU boards that make Tesla a great computing solution. The combination of the world’s fastest GPU accelerators, the widely used CUDA parallel computing model, and a comprehensive ecosystem of software developers, software vendors, and data center system OEMs make Tesla the leading platform for accelerating data analytics and scientific computing.

The Tesla Accelerated Computing Platform provides advanced system management features and accelerated communication technology, and it is supported by popular infrastructure management software. These enable HPC professionals to easily deploy and manage Tesla accelerators in the data center. Tesla-accelerated applications are powered by CUDA, NVIDIA’s pervasive parallel computing platform and programming model, which provides application developers with a comprehensive suite of tools for productive, high-performance software development.

This post gives an overview of the broad range of technologies, tools, and components of the Tesla Accelerated Computing Platform that are available to application developers. Here’s what you need to know about the Tesla Platform. Continue reading

Six Ways to SAXPY

This post is a GPU program chrestomathy. What’s a Chrestomathy, you ask?

In computer programming, a program chrestomathy is a collection of similar programs written in various programming languages, for the purpose of demonstrating differences in syntax, semantics and idioms for each language. [Wikipedia]

There are several good examples of program chrestomathies on the web, including Rosetta Code andNBabel, which demonstrates gravitational N-body simulation in multiple programming languages. In this post I demonstrate six ways to implement a simple SAXPY computation on the CUDA platform. Why is this interesting? Because it demonstrates the breadth of options you have today for programming NVIDIA GPUs, and it covers the three main approaches to GPU computing: GPU-accelerated libraries, GPU compiler directives, and GPU programming languages.

SAXPY stands for “Single-Precision A·X Plus Y”.  It is a function in the standard Basic Linear Algebra Subroutines (BLAS)library. SAXPY is a combination of scalar multiplication and vector addition, and it’s very simple: it takes as input two vectors of 32-bit floats X and Y with N elements each, and a scalar value A. It multiplies each element X[i] by A and adds the result to Y[i]. A simple C implementation looks like this.

void saxpy(int n, float a, float *x, float *y)
{
  for (int i = 0; i < n; ++i)
      y[i] = a*x[i] + y[i];
}

// Perform SAXPY on 1M elements
saxpy(1<<20, 2.0, x, y);

Continue reading

In the Trenches at GTC: Languages, APIs and Development Tools for GPU Computing

By Michael Wang, The University Of Melbourne, Australia (GTC ’12 Guest Blogger)

It’s 9 am, the first morning session of the pre-conference Tutorial Day. The atmosphere in the room is one of quiet anticipation. NVIDIA’s Will Ramey takes the stage and says: “this is going to be a great week.”

I couldn’t agree more. A quick show of hands reveals that more than 90% of the 200-strong audience had used CUDA in the past week. The prophetic words of Jack Dongarra aptly sum up why we are all here:

GPUs have evolved to the point where many real-world applications are easily implemented on them and run significantly faster than on multi-core systems. Future computing architectures will be hybrid systems with parallel-core GPUs working in tandem with multi-core CPUs.

And things couldn’t be easier if you consider the three broad categories of tools available to you today: Continue reading