In LLM training, Expert Parallel (EP) communication for hyperscale mixture-of-experts (MoE) models is challenging. EP communication is essentially all-to-all,... In LLM training, Expert Parallel (EP)...
Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton
NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things... NVIDIA...
Establishing a Scalable Sparse Ecosystem with the Universal Sparse Tensor
Sparse tensors are vectors, matrices, and higher-dimensional generalizations with many zeros. They are crucial in various fields such as scientific computing,... Sparse tensors are vectors,...
Greyhawkery Comics: Under #25
Welcome back readers! Since I'm waaaay ahead on producing my comics, I'm starting a two-month blitz where I will release two a week. One Cultists...
Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk
AI coding agents enable developers to work faster by streamlining tasks and driving automated, test-driven development. However, they also introduce a... AI coding agents enable...
