Welcome back Greyhawk fanatics! You know the drill, it's time for another Cultists episode. This one may be familiar to those who remember last time....
More like this
Advanced NVIDIA CUDA Kernel Optimization Techniques: Handwritten PTX
As accelerated computing continues to drive application performance in all areas of AI and scientific computing, there's a renewed interest in GPU optimization... As accelerated...
More like this
NVIDIA Omniverse: What Developers Need to Know About Migration Away From Launcher
As part of continued efforts to ensure NVIDIA Omniverse is a developer-first platform, NVIDIA will be deprecating the Omniverse Launcher on Oct. 1. Doing so......
More like this
Optimizing FLUX.1 Kontext for Image Editing with Low-Precision Quantization
FLUX.1 Kontext, the recently released model from Black Forest Labs, is a fascinating addition to the repertoire of community image generation models. The open... FLUX.1...
More like this
Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training
In this blog post, we’ll break down the main FP8 scaling strategies—per-tensor scaling, delayed and current scaling, and per-block scaling (including the... In this blog...
