The latest release of CUDA Toolkit continues to push the envelope of accelerated computing performance using the latest NVIDIA GPUs. New features of this…
The latest release of CUDA Toolkit continues to push the envelope of accelerated computing performance using the latest NVIDIA GPUs. New features of this release, version 12.3, include:
Lazy loading default on Windows
Single-step CUDA uninstall on Windows
Enhanced NVIDIA Nsight Compute and NVIDIA Nsight Systems developer tools
CUDA and the CUDA Toolkit continue to provide the foundation for all accelerated computing applications in data science, machine learning and deep learning, generative AI with LLMs for both training and inference, graphics and simulation, and scientific computing. CUDA is fundamental to helping solve the world’s most complex computing problems.
NVIDIA Nsight Developer Tools
The latest versions of NVIDIA Nsight Developer Tools are included in the CUDA Toolkit to help you optimize and debug your CUDA applications on NVIDIA Grace Hopper platforms.
Nsight Compute
Nsight Compute provides detailed profiling and analysis for CUDA kernels, and version 2023.3 debuts with CUDA Toolkit 12.3. This version includes features that improve performance and data collection and analysis capabilities.
The new PM Sampling feature adds time-correlated kernel performance data. Previously, most performance metrics were aggregated across an entire kernel. This frequently requested feature can help users uncover performance issues that occur in phases within a kernel and temporal effects such as the tail effect (Figure 1). It is included in the –full metric set. It can be added as the PM Sampling section in the GUI, or by adding the –section PmSampling flag to the CLI.
Nsight Compute 2023.3 also introduces the ability to compare source code changes across profiles to see how modifications have impacted performance at the source level. To use this feature, set one report as a baseline, and click the Source Comparison button from another report to view highlighted source differences and the associated performance metrics.
Use the –-lineinfo flag when compiling the kernel to enable source resolution and if the source file is modified in place. Use the Import Source option or –import-source flag to preserve the original source code.
To learn more about Nsight Compute 2023.3 features, see Getting Started with Nsight Compute.
Nsight Systems
CUDA Toolkit 12.3 also includes Nsight Systems 2023.3, a performance tuning tool that profiles hardware metrics and CUDA apps, APIs, and libraries on a unified timeline.
The latest version of Nsight Systems introduces support for NVIDIA Grace CPU, enabling you to drill into Grace CPU cycles in the context of your application’s performance. Nsight Systems 2023.3 also adds new features, including network interface card (NIC) profiling from the GUI.
As the primary way that data moves between hardware units on a server, understanding internode communication from the network will help diagnose bottlenecks. Nsight Systems monitors NIC throughput, charting the volume of bytes sent and received. Extended NIC wait times are a strong indication that the internode network needs optimization. Nsight Systems can also profile NVIDIA Quantum InfiniBand switch throughput.
To learn more about Nsight Systems 2023.3 features, see Getting Started with Nsight Systems. For a deeper dive into how Nsight Systems supports development at data center scale, see Accelerating Data Center and HPC Performance Analysis with NVIDIA Nsight Systems.
Summary
The CUDA Toolkit 12.3 release enriches the foundational NVIDIA driver and runtime software for accelerated computing while continuing to provide enhanced support for the newest NVIDIA GPUs, accelerated libraries, compilers, and developer tools.
To learn more, see the CUDA documentation, check out the latest NVIDIA Deep Learning Institute offerings, and browse the NGC Catalog. Ask questions and join the conversation in the CUDA Developer Forums.