Cublaslt Grouped Gemm Info

[QST] GEMM Batched or Single? · Issue #1414 · NVIDIA/cutlass

For cublasLt grouped GEMM, a high-impact feature would be a , specifically designed to handle "extreme variance" in problem sizes within a single group. The Feature: Load-Balanced Persistent Scheduler cublaslt grouped gemm

// For Batched/Grouped: Strides define the step to the next matrix in the group int64_t strideA = M * K; int64_t strideB = K * N; int64_t strideC = M * N; int batchCount = 100; // Number of GEMMs in the group [QST] GEMM Batched or Single

For implementation details, refer to the NVIDIA cuBLASLt Developer Guide (CUDA 12.x and later). a high-impact feature would be a