Next Generation of FlashAttention

NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Grace Hopper GPU…

NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Grace Hopper GPU architecture and Tensor Cores and accelerate key Fused Attention kernels using CUTLASS 3. FlashAttention-3 incorporates key techniques to achieve 1.5–2.0x faster performance than FlashAttention-2 with FP16, up to 740 TFLOPS. With FP8…

Source

Leave a Reply

Your email address will not be published.

Previous post Tencent’s free-to-play co-op shooter Synced is closing just a year after it launched
Next post Spotlight: Siemens Energy Accelerates Power Grid Asset Simulation 10,000x Using NVIDIA Modulus