nvidia-cusparselt-cu12 (0.6.3)
Installation
pip install --index-url nvidia-cusparselt-cu12About this package
NVIDIA cuSPARSELt
################################################################################### cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication ###################################################################################
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:
.. math::
D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale
where :math:op(A)/op(B) refers to in-place operations such as transpose/non-transpose, and :math:alpha, beta, scale are scalars.
The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
Download: developer.nvidia.com/cusparselt/downloads <https://developer.nvidia.com/cusparselt/downloads>_
Provide Feedback: Math-Libs-Feedback@nvidia.com <mailto:Math-Libs-Feedback@nvidia.com?subject=cuSPARSELt-Feedback>_
Examples:
cuSPARSELt Example 1 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul>,
cuSPARSELt Example 2 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul_advanced>
Blog post:
Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt <https://developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt/>_Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines <https://developer.nvidia.com/blog/structured-sparsity-in-the-nvidia-ampere-architecture-and-applications-in-search-engines/>__Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture <https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31552/>__
================================================================================ Key Features
-
NVIDIA Sparse MMA tensor core support
-
Mixed-precision computation support:
+--------------+----------------+-----------------+-------------+ | Input A/B | Input C | Output D | Compute | +==============+================+=================+=============+ |
FP32|FP32|FP32|FP32| +--------------+----------------+-----------------+-------------+ |FP16|FP16|FP16|FP32|-
+ + +-------------+
| | | |
FP16| +--------------+----------------+-----------------+-------------+ |BF16|BF16|BF16|FP32| +--------------+----------------+-----------------+-------------+ |INT8|INT8|INT8|INT32|-
+----------------+-----------------+ +
| |
INT32|INT32| |-
+----------------+-----------------+ +
| |
FP16|FP16| |-
+----------------+-----------------+ +
| |
BF16|BF16| | +--------------+----------------+-----------------+-------------+ |E4M3|FP16|E4M3|FP32|-
+----------------+-----------------+ +
| |
BF16|E4M3| |-
+----------------+-----------------+ +
| |
FP16|FP16| |-
+----------------+-----------------+ +
| |
BF16|BF16| |-
+----------------+-----------------+ +
| |
FP32|FP32| | +--------------+----------------+-----------------+-------------+ |E5M2|FP16|E5M2|FP32|-
+----------------+-----------------+ +
| |
BF16|E5M2| |-
+----------------+-----------------+ +
| |
FP16|FP16| |-
+----------------+-----------------+ +
| |
BF16|BF16| |-
+----------------+-----------------+ +
| |
FP32|FP32| | +--------------+----------------+-----------------+-------------+ -
-
Matrix pruning and compression functionalities
-
Activation functions, bias vector, and output scaling
-
Batched computation (multiple matrices in a single run)
-
GEMM Split-K mode
-
Auto-tuning functionality (see
cusparseLtMatmulSearch()) -
NVTX ranging and Logging functionalities
================================================================================ Support
- Supported SM Architectures:
SM 8.0,SM 8.6,SM 8.9,SM 9.0 - Supported CPU architectures and operating systems:
+------------+--------------------+
| OS | CPU archs |
+============+====================+
| Windows | x86_64 |
+------------+--------------------+
| Linux | x86_64, Arm64 |
+------------+--------------------+
================================================================================ Documentation
Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.
================================================================================ Installation
The cuSPARSELt wheel can be installed as follows:
.. code-block:: bash
pip install nvidia-cusparselt-cuXX
where XX is the CUDA major version (currently CUDA 12 only is supported).