Transformer architecture has become a foundational breakthrough driving the revolution in generative AI, powering large language models (LLMs) like GPT,... Transformer architecture has become a...
Scaling NVFP4 Inference for FLUX.2 on NVIDIA Blackwell Data Center GPUs
In 2025, NVIDIA partnered with Black Forest Labs (BFL) to optimize the FLUX.1 text-to-image model series, unlocking FP4 image generation performance on NVIDIA... In 2025,...
Streamlining CUB with a Single-Call API
The C++ template library CUB is a go-to for high-performance GPU primitive algorithms, but its traditional "two-phase" API, which separates memory estimation... The C++ template...
Greyhawkery Comics: Under #24
Greetings Greyhawk explorers. Today is a new installment of my short story Under. Follow the links below to catch up if you are just arriving....
How to Train an AI Agent for Command-Line Tasks with Synthetic Data and Reinforcement Learning
What if your computer-use agent could learn a new Command Line Interface (CLI)—and operate it safely without ever writing files or free-typing shell commands?... What...
