Large language models (LLMs) are rapidly changing the business landscape, offering new capabilities in natural language processing (NLP), content generation,... Large language models (LLMs) are...
NVIDIA TensorRT-LLM Now Supports Recurrent Drafting for Optimizing LLM Inference
Recurrent drafting (referred as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM)... Recurrent drafting (referred as...
Greyhawkery Comics: Graz’zt Show #1
Season's Greetings Greyhawkers! Today's comic is a surprise present for my readers. I used to do annual Needfest comics around this time of year (those...
Data-Efficient Knowledge Distillation for Supervised Fine-Tuning with NVIDIA NeMo-Aligner
Knowledge distillation is an approach for transferring the knowledge of a much larger teacher model to a smaller student model, ideally yielding a compact,... Knowledge...
Efficient Ray Tracing with NVIDIA OptiX Shader Binding Table Optimization
NVIDIA OptiX is the API for GPU-accelerated ray tracing with CUDA, and is often used to render scenes containing a wide variety of objects and...
