cuda_cpp_simple

How to Query Device Properties and Handle Errors in CUDA C/C++

In this third post of the CUDA C/C++ series we discuss various characteristics of the wide range of CUDA-capable GPUs, how to query device properties from within a CUDA C/C++ program, and how to handle errors.

Querying Device Properties

In our last post, about performance metrics, we discussed how to compute the theoretical peak bandwidth of a GPU. This calculation used the GPU’s memory clock rate and bus interface width, which we obtained from product literature. The following CUDA C++ code demonstrates a more general approach, calculating the theoretical peak bandwidth by querying the attached device (or devices) for the needed information.

#include <stdio.h> 

int main() {
  int nDevices;

  cudaGetDeviceCount(&nDevices);
  for (int i = 0; i < nDevices; i++) {
    cudaDeviceProp prop;
    cudaGetDeviceProperties(&prop, i);
    printf("Device Number: %dn", i);
    printf("  Device name: %sn", prop.name);
    printf("  Memory Clock Rate (KHz): %dn",
           prop.memoryClockRate);
    printf("  Memory Bus Width (bits): %dn",
           prop.memoryBusWidth);
    printf("  Peak Memory Bandwidth (GB/s): %fnn",
           2.0*prop.memoryClockRate*(prop.memoryBusWidth/8)/1.0e6);
  }
}

Continue reading

How to Query Device Properties and Handle Errors in CUDA Fortran

In this third post of the CUDA Fortran series we discuss various characteristics of the wide range of CUDA-capable GPUs, how to query device properties from within a CUDA Fortran program, and how to handle errors.

Querying Device Properties

In our last post, about performance metrics, we discussed how to compute the theoretical peak bandwidth of a GPU. This calculation used the GPU’s memory clock rate and bus interface width, which we obtained from product literature. The following CUDA Fortran code demonstrates a more general approach, calculating the theoretical peak bandwidth by querying the attached device (or devices) for the needed information.

program peakBandwidth
  use cudafor
  implicit none

  integer :: i, istat, nDevices
  type (cudaDeviceProp) :: prop

  istat = cudaGetDeviceCount(nDevices)
  do i = 0, nDevices-1
     istat = cudaGetDeviceProperties(prop, i)
     write(*,"(' Device Number: ',i0)") i
     write(*,"('   Device name: ',a)") trim(prop%name)
     write(*,"('   Memory Clock Rate (KHz): ', i0)") &
       prop%memoryClockRate
     write(*,"('   Memory Bus Width (bits): ', i0)") &
       prop%memoryBusWidth
     write(*,"('   Peak Memory Bandwidth (GB/s): ', f6.2)") &
       2.0*prop%memoryClockRate*(prop%memoryBusWidth/8)/10.0**6
     write(*,*)        
  enddo
end program peakBandwidth

Continue reading