Can AI coding assistants write efficient CUDA code? To help measure and improve their capabilities, we created ComputeEval, a robust, open source benchmark for... Can...
More like this
Enhancing GPU-Accelerated Vector Search in Faiss with NVIDIA cuVS
As companies collect more unstructured data and increasingly use large language models (LLMs), they need faster and more scalable systems. Advanced tools for... As companies...
More like this
Accelerating Large-Scale Mixture-of-Experts Training in PyTorch
Training massive mixture-of-experts (MoE) models has long been the domain of a few advanced users with deep infrastructure and distributed-systems expertise.... Training massive mixture-of-experts (MoE)...
More like this
Greyhawkery Comics: Cultists #19
Welcome back faithful followers! While most of you Greyhawkers have been going about your normal old school campaigns, the Cultists have quietly been exploring their...
More like this
Scale Biology Transformer Models with PyTorch and NVIDIA BioNeMo Recipes
Training models with billions or trillions of parameters demands advanced parallel computing. Researchers must decide how to combine parallelism strategies,... Training models with billions or...
