Update News ((new)): Cuda 12.6
CUDA 12.6 introduces initial support for NVIDIA’s next-generation Blackwell GPU architecture (Compute Capability 10.0). This includes new PTX instructions and compiler optimizations tailored for high-performance AI and HPC workloads.
: New asynchronous APIs like cuMemcpyBatchAsync and cuMemcpyBatch3DAsync allow for variable-sized transfers between multiple source and destination buffers in a single operation. cuda 12.6 update news
: Focused on performance optimizations for Grace CPU systems and Windows CPU NUMA allocation. CUDA 12
New “Memory Workload Analysis” section breaks down traffic per memory bank on A100/H100 architectures. Also adds support for Blackwell’s new cache hierarchy. cuda 12.6 update news