The field of high-performance computing has witnessed a period of rapid advancement with the advent of the next generation of Graphics Processing Units (GPUs). These advancements have required the development of novel algorithms that can fully exploit the capabilities of modern GPUs. This Innovation Study focuses on a highly innovative algorithm designed to rethink iterative linear solvers. This study examines two key features: mixed precision solvers and asynchronous execution, which together promise substantial improvements in runtime performance and scalability.
The first feature of AceAMG is the integration of mixed precision solvers and preconditioners, specifically Algebraic Multigrid (AMG). By leveraging mixed precision techniques, the aim is to reduce data access overheads and benefit from the faster processing speeds of low-precision arithmetic. This approach is expected to significantly accelerate simulations and reduce latencies across a variety of hardware architectures, including edge devices, mainstream CPUs, accelerators, and emerging RISC-V-based systems. In particular, the algorithm addresses the current challenge of excessive data transfer between GPUs and CPUs. By minimising these transfers, this study anticipates a considerable increase in GPU utilization, thereby enhancing overall computational efficiency on today’s largest supercomputers.
The second key feature of AceAMG is the implementation of synchronization-eliminating techniques, such as asynchronous restricted additive Schwarz. These algorithms iterate over subdomains independently, thus obviating the necessity for coordinated boundary information exchange. This asynchronous approach is capable of overcoming the limitations of traditional bulk synchronous methods with regard to scaling. It is anticipated that this feature will result in enhanced scalability, with parallel efficiencies exceeding 80% on configurations involving up to 80,000 GPUs. The asynchronous execution model reduces the overhead associated with synchronisation, thereby facilitating more efficient parallel computation and more effective resource utilisation.
AceAMG will be implemented within the Ginkgo sparse linear algebra framework. Ginkgo provides a robust platform for developing high-performance solvers, rendering it an optimal choice for integrating our mixed-precision and asynchronous execution strategies. To demonstrate the effectiveness of AceAMG, the Computational Fluid Dynamics (CFD) code nekRS will be integrated. This combination will be employed to transcend the limitations of atmospheric boundary layer (ABL) simulations, particularly in surpassing the scale and resolution of well-known benchmark cases such as GABLS1-3 and CUABL.
The capacity to manage larger and more intricate simulations will facilitate the exploration of novel avenues for research across a range of scientific disciplines. This study will not only advance the state of the art in high-performance computing (HPC) but also contribute to a broader understanding of atmospheric dynamics, thereby supporting scientific endeavours in climate science. In particular, the enhancements to ABL simulations will facilitate a more profound comprehension of atmospheric phenomena, which are of paramount importance for the development of climate models and weather forecasting.