nvidia-cusparselt-cu12 (0.6.3)

2025-06-28T10:11:18Z

To install the package using pip, run the following command:

pip install --index-url  nvidia-cusparselt-cu12

For more information on the PyPI registry, see the documentation.

NVIDIA cuSPARSELt

################################################################################### cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication ###################################################################################

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

.. math::

D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale

where :math:op(A)/op(B) refers to in-place operations such as transpose/non-transpose, and :math:alpha, beta, scale are scalars.

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Download: developer.nvidia.com/cusparselt/downloads <https://developer.nvidia.com/cusparselt/downloads>_

Provide Feedback: Math-Libs-Feedback@nvidia.com <mailto:Math-Libs-Feedback@nvidia.com?subject=cuSPARSELt-Feedback>_

Examples: cuSPARSELt Example 1 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul>, cuSPARSELt Example 2 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul_advanced>

Blog post:

Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt <https://developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt/>_
Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines <https://developer.nvidia.com/blog/structured-sparsity-in-the-nvidia-ampere-architecture-and-applications-in-search-engines/>__
Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture <https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31552/>__

================================================================================ Key Features

NVIDIA Sparse MMA tensor core support
Mixed-precision computation support:

+--------------+----------------+-----------------+-------------+ | Input A/B | Input C | Output D | Compute | +==============+================+=================+=============+ | FP32 | FP32 | FP32 | FP32 | +--------------+----------------+-----------------+-------------+ | FP16 | FP16 | FP16 | FP32 |
- ```
         +                +                 +-------------+
```
| | | | FP16 | +--------------+----------------+-----------------+-------------+ | BF16 | BF16 | BF16 | FP32 | +--------------+----------------+-----------------+-------------+ | INT8 | INT8 | INT8 | INT32 |
- ```
         +----------------+-----------------+             +
```
| | INT32 | INT32 | |
- ```
         +----------------+-----------------+             +
```
| | FP16 | FP16 | |
- ```
         +----------------+-----------------+             +
```
| | BF16 | BF16 | | +--------------+----------------+-----------------+-------------+ | E4M3 | FP16 | E4M3 | FP32 |
- ```
         +----------------+-----------------+             +
```
| | BF16 | E4M3 | |
- ```
         +----------------+-----------------+             +
```
| | FP16 | FP16 | |
- ```
         +----------------+-----------------+             +
```
| | BF16 | BF16 | |
- ```
         +----------------+-----------------+             +
```
| | FP32 | FP32 | | +--------------+----------------+-----------------+-------------+ | E5M2 | FP16 | E5M2 | FP32 |
- ```
         +----------------+-----------------+             +
```
| | BF16 | E5M2 | |
- ```
         +----------------+-----------------+             +
```
| | FP16 | FP16 | |
- ```
         +----------------+-----------------+             +
```
| | BF16 | BF16 | |
- ```
         +----------------+-----------------+             +
```
| | FP32 | FP32 | | +--------------+----------------+-----------------+-------------+
Matrix pruning and compression functionalities
Activation functions, bias vector, and output scaling
Batched computation (multiple matrices in a single run)
GEMM Split-K mode
Auto-tuning functionality (see cusparseLtMatmulSearch())
NVTX ranging and Logging functionalities

================================================================================ Support

Supported SM Architectures: SM 8.0, SM 8.6, SM 8.9, SM 9.0
Supported CPU architectures and operating systems:

+------------+--------------------+ | OS | CPU archs | +============+====================+ | Windows | x86_64 | +------------+--------------------+ | Linux | x86_64, Arm64 | +------------+--------------------+

================================================================================ Documentation

Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.

================================================================================ Installation

The cuSPARSELt wheel can be installed as follows:

.. code-block:: bash

pip install nvidia-cusparselt-cuXX

where XX is the CUDA major version (currently CUDA 12 only is supported).

Details

PyPI

2025-06-28 10:11:18 +00:00

17

NVIDIA Corporation

Project Site

NVIDIA Proprietary Software

150 MiB

Assets (1)

nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl 150 MiB

Versions (1) View all

0.6.3

2025-06-28

nvidia-cusparselt-cu12 (0.6.3)

Installation

About this package

================================================================================ Key Features

================================================================================ Support

================================================================================ Documentation

================================================================================ Installation