When using the NVIDIA Collective Communication Library (NCCL) to run a deep learning training or inference workload that uses collective operations (such as…
When using the NVIDIA Collective Communication Library (NCCL) to run a deep learning training or inference workload that uses collective operations (such as AllReduce, AllGather, and ReduceScatter), it can be challenging to determine how NCCL is performing during the actual workload run. This post introduces the NCCL Inspector Profiler Plugin, which addresses this problem. It offers a way for…
