CUDA Pro Tip: Increase Performance with Vectorized Memory Access