This Innovation Study introduces the development and testing of an adaptive Conjugate Gradient (aCG) algorithm, a revolutionary solution aimed at mitigating communication and synchronization latencies prevalent in large-scale GPU-based supercomputers. The imperative task of solving vast linear systems, often necessitates iterative methods, such as the Conjugate Gradient (CG) algorithm from the Krylov subspace family. By building upon an adaptive algorithm, aCG diverges from the conventional CPU-controlled standard CG algorithm, addressing host-induced overheads during GPU execution.
The aCG algorithm takes a departure from the CPU-controlled standard CG algorithm, where the host centrally manages all operations during GPU execution. Instead, aCG introduces CPU-free execution of CG, exploiting the advanced hardware and software capabilities of GPUs. This includes features such as device-initiated communication, persistent kernels, device-side synchronization, and thread block specialization, which collectively minimize host-induced overheads and implement sophisticated communication-hiding techniques.
The aCG algorithm is using the performance modeling to dynamically switch between different CG variants. This adaptability is based on sparse matrix features and hardware specifications, ensuring the algorithm achieves optimal performance under varying computational scenarios. Such dynamic performance optimization is crucial for addressing the diverse challenges posed by large-scale linear systems.
To facilitate widespread adoption, the aCG API is designed for compatibility with widely recognized linear algebra libraries. This integration allows a seamless transition from the traditional CG to the adaptive aCG within existing applications.
The chosen applications for testing the aCG algorithm encompass typical examples found in all branches of computational science. These applications heavily rely on CG-like iterative methods for solving large systems of linear equations. As such, the aCG algorithm holds significant promise in benefitting numerous High-Performance Computing (HPC) applications.
In addition to addressing communication and synchronization latencies, the CPU-free execution enabled by aCG has the potential to unlock the full capabilities of GPU-based supercomputers. By capitalizing on new hardware features, aCG seeks to not only improve performance but also enhance energy efficiency.