The CUDA Profiling Tools Interface (CUPTI) added "Range Profiling APIs" to simplify how developers measure performance.
NVIDIA has announced the release of CUDA 12.6, the latest version of its popular parallel computing platform and programming model. This update brings a host of new features, improvements, and optimizations that are set to revolutionize the way developers create and deploy GPU-accelerated applications. cuda 12.6 release news
CUDA 12.6 is now available for download from the NVIDIA Developer website. The release is supported on Windows, Linux, and macOS platforms, and is compatible with a wide range of NVIDIA GPUs. The CUDA Profiling Tools Interface (CUPTI) added "Range
| Library | Key Changes in CUDA 12.6 | |---------|--------------------------| | | New FP8 GEMM kernels for Hopper (up to 2x faster than 12.5). cublasGemmEx supports CUBLAS_COMPUTE_32I for integer GEMM. | | cuDNN | Version 9.2.0 integrated. Adds FlashAttention-3 (FP8) support on H200. Grouped convolutions optimized for 4D tensors. | | cuFFT | Support for half-precision R2C and C2R transforms up to 3D. Reduced memory footprint for multi-GPU transforms. | | cuSPARSE | New sparse matrix–vector (SpMV) for block compressed sparse row (BSR) format with FP16/BF16. | | NCCL | Included NCCL 2.21.5. Adds NVLS (NVIDIA Link Switch) support for multi-node all-reduce. Improved ring/tree autotuning. | | CUDA Math API | New __h2bf16 and __bf162h intrinsics for Hopper. | CUDA 12
A new CUDBG_COREDUMP_SKIP_CONSTBANK_MEMORY flag allows developers to exclude constant memory from core dumps, reducing file sizes during debugging.