The NVIDIA Collective Communication Library (NCCL) implements multi-GPU and multi-node communication primitives optimized for NVIDIA GPUs and Networking. NCCL provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter as well as point-to-point send and receive that are optimized to achieve high bandwidth and low latency over PCIe and NVLink high-speed interconnects within a node and over NVIDIA Mellanox Network across nodes.
NCCL is available for download as part of the NVIDIA HPC SDK and as a separate package for Ubuntu and Red Hat.
Automatic topology detection for high bandwidth paths on AMD, ARM, PCI Gen4 and IB HDR
Up to 2x peak bandwidth with in-network all reduce operations utilizing SHARPV2
Graph search for the optimal set of rings and trees with the highest bandwidth and lowest latency
Support multi-threaded and multi-process applications
InfiniBand verbs, libfabric, RoCE and IP Socket internode communication
Reroute traffic and alleviate congested ports with InfiniBand Adaptive routing