The computational and memory requirements of Deep Learning (DL) models pose challenges, especially over long training periods. Sparse training offers a potential solution for such computational burdens. While libraries such as cuSparse and cuSparseLt provide implementations of sparse routines necessary for frameworks such as Pytorch, achieving substantial speedups, especially for extreme sparsity ratios, remains difficult.
Sparse training exploits the inherent sparsity of neural networks and relies on sparse matrix operations such as Sparse Matrix-Matrix Multiplication (SPMM) and Sampled Dense-Dense Matrix Multiplication (SDDMM), which are crucial for training deep learning models. While SPMM has been optimised to some extent, SDDMM optimisation, particularly for maintaining high performance across different sparsity ratios, has not been fully explored. Existing sparse libraries do not provide significant speedups for extreme sparsity ratios, which hinders their widespread adoption. To address this limitation, this study updates NVIDIA’s Sparse Tensor Cores (SPTC) and second-order pruning techniques to improve the efficiency of sparse training on GPUs.
ESPLAG builds on previous successes in end-to-end inference on Large Language Models (LLMs) and extends them to sparse training on NVIDIA GPUs. Key components include
- Leveraging Sparse Tensor Cores (SPTC) for optimised sparse matrix operations
- Implementing second-order pruning techniques to achieve high sparsity ratios while maintaining model accuracy
- Developing an efficient Sampled Dense Matrix Multiplication (SDDMM) kernel based on successful design principles from Sparse Matrix Multiplication (SPMM)
- Algorithmically efficient integration of optimised kernels into the transformer model training pipeline
Extensive experiments will be conducted on relevant datasets and state-of-the-art deep learning models, evaluating performance metrics such as training time, memory consumption and model accuracy across different sparsity ratios. Comparative analyses with existing sparse training techniques can demonstrate the effectiveness of this study.
The proposed approach aims to overcome existing limitations in sparse training on NVIDIA GPUs by exploiting specialised hardware features and innovative pruning techniques. By optimising key sparse matrix operations and integrating them into the training pipeline of transformer models, ESPLAG expects significant improvements in training efficiency and scalability.