Benchmarking LLMs on AI-Generated CUDA Code with ComputeEval 2025.2

Can AI coding assistants write efficient CUDA code? To help measure and improve their capabilities, we created ComputeEval, a robust, open source benchmark for…

Can AI coding assistants write efficient CUDA code? To help measure and improve their capabilities, we created ComputeEval, a robust, open source benchmark for evaluating AI models and agents on CUDA programming tasks. A few months ago, we announced the first release of ComputeEval and today, we’re introducing its first major expansion by adding more than 100 new CUDA challenges.

Source