CuTe, a core component of CUTLASS 3.x, provides a unified algebra for describing data layouts and thread mappings, and abstracts complex memory access patterns... CuTe,...
More like this
Just Released: Warp 1.10 Expands JAX Interoperability and Performance
Build high-performance GPU simulations using Warp, with enhancements across JAX, Tile programming, and Arm support. Build high-performance GPU simulations using Warp, with enhancements across JAX,...
More like this
Greyhawkery Comics: Under #19
Howdy Greyfolk! My short story Under rolls on in the Under-Oerth town of Underhold. Going back to the beginning of the story (check the links...
More like this
NVIDIA Blackwell Architecture Sweeps MLPerf Training v5.1 Benchmarks
The NVIDIA Blackwell architecture powered the fastest time to train across every MLPerf Training v5.1 benchmark, marking a clean sweep in the latest round of......
More like this
Fusing Communication and Compute with New Device API and Copy Engine Collectives in NVIDIA NCCL 2.28
The latest release of the NVIDIA Collective Communications Library (NCCL) introduces a groundbreaking fusion of communication and computation for higher... The latest release of the...
