Outperforming cuBLAS on Blackwell
NVIDIA's cuBLAS library has long been considered the gold standard for GPU matrix operations, representing decades of optimization work. With the release of the Blackwell architecture, new opportunities have emerged to push beyond cuBLAS performance through careful exploitation of architectural features and novel algorithmic approaches.
