cuda_pro_tip

CUDA Pro Tip: Control GPU Visibility with CUDA_VISIBLE_DEVICES

As a CUDA developer, you will often need to control which devices your application uses. In a short-but-sweet post on the Acceleware blog, Chris Mason writes:

Acceleware LogoDoes your CUDA application need to target a specific GPU? If you are writing GPU enabled code, you would typically use a device query to select the desired GPUs. However, a quick and easy solution for testing is to use the environment variable CUDA_VISIBLE_DEVICES to restrict the devices that your CUDA application sees. This can be useful if you are attempting to share resources on a node or you want your GPU enabled executable to target a specific GPU

As Chris points out, robust applications should use the CUDA API to enumerate and select devices with appropriate capabilities at run time. To learn how, read the section on Device Enumeration in the CUDA Programming Guide. But the CUDA_VISIBLE_DEVICES environment variable is handy for restricting execution to a specific device or set of devices for debugging and testing.  You can also use it to control execution of applications for which you don’t have source code, or to launch multiple instances of a program on a single machine, each with its own environment and set of visible devices. Continue reading

cuda_cpp_simple

How to Query Device Properties and Handle Errors in CUDA C/C++

In this third post of the CUDA C/C++ series we discuss various characteristics of the wide range of CUDA-capable GPUs, how to query device properties from within a CUDA C/C++ program, and how to handle errors.

Querying Device Properties

In our last post, about performance metrics, we discussed how to compute the theoretical peak bandwidth of a GPU. This calculation used the GPU’s memory clock rate and bus interface width, which we obtained from product literature. The following CUDA C++ code demonstrates a more general approach, calculating the theoretical peak bandwidth by querying the attached device (or devices) for the needed information.

#include <stdio.h> 

int main() {
  int nDevices;

  cudaGetDeviceCount(&nDevices);
  for (int i = 0; i < nDevices; i++) {
    cudaDeviceProp prop;
    cudaGetDeviceProperties(&prop, i);
    printf("Device Number: %dn", i);
    printf("  Device name: %sn", prop.name);
    printf("  Memory Clock Rate (KHz): %dn",
           prop.memoryClockRate);
    printf("  Memory Bus Width (bits): %dn",
           prop.memoryBusWidth);
    printf("  Peak Memory Bandwidth (GB/s): %fnn",
           2.0*prop.memoryClockRate*(prop.memoryBusWidth/8)/1.0e6);
  }
}

Continue reading

How to Query Device Properties and Handle Errors in CUDA Fortran

In this third post of the CUDA Fortran series we discuss various characteristics of the wide range of CUDA-capable GPUs, how to query device properties from within a CUDA Fortran program, and how to handle errors.

Querying Device Properties

In our last post, about performance metrics, we discussed how to compute the theoretical peak bandwidth of a GPU. This calculation used the GPU’s memory clock rate and bus interface width, which we obtained from product literature. The following CUDA Fortran code demonstrates a more general approach, calculating the theoretical peak bandwidth by querying the attached device (or devices) for the needed information.

program peakBandwidth
  use cudafor
  implicit none

  integer :: i, istat, nDevices
  type (cudaDeviceProp) :: prop

  istat = cudaGetDeviceCount(nDevices)
  do i = 0, nDevices-1
     istat = cudaGetDeviceProperties(prop, i)
     write(*,"(' Device Number: ',i0)") i
     write(*,"('   Device name: ',a)") trim(prop%name)
     write(*,"('   Memory Clock Rate (KHz): ', i0)") &
       prop%memoryClockRate
     write(*,"('   Memory Bus Width (bits): ', i0)") &
       prop%memoryBusWidth
     write(*,"('   Peak Memory Bandwidth (GB/s): ', f6.2)") &
       2.0*prop%memoryClockRate*(prop%memoryBusWidth/8)/10.0**6
     write(*,*)        
  enddo
end program peakBandwidth

Continue reading