CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design
Posted on by
GEMM optimization on GPUs is a modular problem. Performant implementations need to specify hyperparameters such as tile shapes, math and copy instructions, and…