The C++ template library CUB is a go-to for high-performance GPU primitive algorithms, but its traditional “two-phase” API, which separates memory estimation…
The C++ template library CUB is a go-to for high-performance GPU primitive algorithms, but its traditional “two-phase” API, which separates memory estimation from allocation, can be cumbersome. While this programming model offers flexibility, it often results in repetitive boilerplate code. This post explains the shift from this API to the new CUB single-call API introduced in CUDA 13.1…
