CUDA Graphs are a way to define and batch GPU operations as a graph rather than a sequence of stream launches. A CUDA Graph groups...
Advanced Strategies for High-Performance GPU Programming with NVIDIA CUDA
Stephen Jones, a leading expert and distinguished NVIDIA CUDA architect, offers his guidance and insights with a deep dive into the complexities of mapping... Stephen...
Accelerating the HPCG Benchmark with NVIDIA Math Sparse Libraries
In the realm of high-performance computing (HPC), NVIDIA has continually advanced HPC by offering its highly optimized NVIDIA High-Performance Conjugate... In the realm of high-performance...
Streamlining Data Processing for Domain Adaptive Pretraining with NVIDIA NeMo Curator
Domain-adaptive pretraining (DAPT) of large language models (LLMs) is an important step towards building domain-specific models. These models demonstrate... Domain-adaptive pretraining (DAPT) of large language...
Post-Training Quantization of LLMs with NVIDIA NeMo and NVIDIA TensorRT Model Optimizer
As large language models (LLMs) are becoming even bigger, it is increasingly important to provide easy-to-use and efficient deployment paths because the cost of... As...
