Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust’s High-Level interface greatly enhances programmer Productivity while enabling performance portability between GPUs and multicore CPUs. Interoperability with established technologies (such as CUDA, TBB, and OpenMP) facilitates integration with existing software. Develop High-Performance applications rapidly with Thrust!
This excerpt from the Thrust home page perfectly summarizes the benefits of the Thrust library. Thrust enables expressive algorithmic programming via a vocabulary of parallel building blocks that let you rapidly develop fast, portable parallel algorithms. If you are a C++ programmer, and especially if you use template libraries like the STL and Boost C++ libraries, then you will find Thrust familiar. Like the STL, Thrust helps you focus on algorithms, rather than on platform-specific implementation details. At the same time, Thrust’s modular design allows low-level customization and interoperation with custom platform-specific code such as CUDA kernels and libraries.
Thrust is High-Level
As described in the article “Thrust, a Productivity-Oriented Library for CUDA”, Thrust aims to solve two types of problems: problems that can be “implemented efficiently without a detailed mapping to the target architecture”, and problems that don’t merit or won’t receive (for whatever reason) significant optimization attention from the programmer. High-level primitives make it easier to capture programmer intent; developers describe what to compute, without dictating how to compute it. This allows the library to make informed decisions about how to implement the intended computation.
Thrust provides an STL-style vector container (with host_vector and device_vector implementations), and a suite of high-level algorithms including searching, sorting, copying, merging, transforming, reordering,reducing, prefix sums, and set operations. Here is an oft-repeated complete example program from the Thrust home page, which generates random numbers serially and then transfers them to the GPU where they are sorted.
// generate 32M random numbers serially
thrust::host_vector h_vec(32 << 20);
std::generate(h_vec.begin(), h_vec.end(), rand);
// transfer data to the device
thrust::device_vector d_vec = h_vec;
// sort data on the device
// transfer data back to host
thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());