Delivering the Missing Building Blocks for NVIDIA CUDA Kernel Fusion in Python

C++ libraries like CUB and Thrust provide high-level building blocks that enable NVIDIA CUDA application and library developers to write speed-of-light code…

C++ libraries like CUB and Thrust provide high-level building blocks that enable NVIDIA CUDA application and library developers to write speed-of-light code that is portable across architectures. Many widely used projects, such as PyTorch, TensorFlow, XGBoost, and RAPIDS, use these abstractions to implement core functionality. The same abstractions are missing in Python. There are high-level…

Source