NVIDIA GPUDirect RDMA is a technology which enables a direct path for data exchange between the GPU and third-party peer devices using standard features of PCI Express. Examples of third-party devices include network interfaces, video acquisition devices, storage adapters, and medical equipment. Enabled on Tesla and Quadro-class GPUs, GPUDirect RDMA relies on the ability of NVIDIA GPUs to expose portions of device memory on a PCI Express Base Address Register region (BAR. See this white paper for more technical details).
Both Open MPI and MVAPICH2 now support GPUDirect RDMA, exposed via CUDA-aware MPI. Since January 2014 the Mellanox Infiniband software stack has supported GPUDirect RDMA on Mellanox ConnectX-3 and Connect-IB devices. See this post on the Mellanox blog for a nice introduction to the topic.
This post is a detailed look at the performance obtainable with available hardware platforms. The main audience for this post is designers and users of GPU-accelerated clusters employing CUDA-aware MPI, and architects and designers of GPU-accelerated low-latency systems, such as in healthcare, aviation, and high-energy physics. It is also complementary to a recent post (Exploring the PCIe Bus Routes) by Cirrascale.
Though the details may change in future hardware, this post suggests expected levels of performance and gives useful hints for performance verification. Continue reading