NVIDIA Collective Communication Library (NCCL)

The NVIDIA Collective Communication Library (NCCL) implements multi-GPU and multi-node communication primitives optimized for NVIDIA GPUs and Networking. NCCL provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter as well as point-to-point send and receive that are optimized to achieve high bandwidth and low latency over PCIe and NVLink high-speed interconnects within a node and over NVIDIA Mellanox Network across nodes.

Leading deep learning frameworks such as Caffe2, Chainer, MxNet, PyTorch and TensorFlow have integrated NCCL to accelerate deep learning training on multi-GPU multi-node systems.

NCCL is available for download as part of the NVIDIA HPC SDK and as a separate package for Ubuntu and Red Hat.

Key Features

  • Automatic topology detection for high bandwidth paths on AMD, ARM, PCI Gen4 and IB HDR

  • Up to 2x peak bandwidth with in-network all reduce operations utilizing SHARPV2

  • Graph search for the optimal set of rings and trees with the highest bandwidth and lowest latency

  • Support multi-threaded and multi-process applications

  • InfiniBand verbs, libfabric, RoCE and IP Socket internode communication

  • Reroute traffic and alleviate congested ports with InfiniBand Adaptive routing

29 views0 comments

Recent Posts

See All