OpenACC pgfortran: substantial speedups and beyond for the O3 Condensation algorithm for determinants and estimation
presentationposted on 07.12.2020, 21:58 by Damien Mather, Chris ScottChris Scott
This milestone achievement report for NeSI project uoo02741 “Extracting D-efficient training samples..” and is an informative case study in applying Open acceleration (OpenACC) directives to MPI fortran using PGI’s smart pgfortran compiler on NeSI’s Mahuika platform. We utilise intel MPI libraries and up to 4 of Mahuika’s P-100 GPUs per batch job and show (a) how substantial speedups can be had with four additional OpenACC compiler directives, and (b) how evolving the algorithm to optimise data locality and reduce process blocking can achieve further substantial speedup. We also demonstrate how these can be achieved consistently in practice across a wide variety of computing platforms from legacy CPUs and Nvidia accelerator cards through to NeSi’s HPC platforms. The Condensation algorithm used in this demonstration has superior scaling performance and immunity to ill-conditioning for both the calculation of determinants, and, by extension, to the estimation of large predictive analytic systems of linear and linearised equations, whilst retaining O(3) computational complexity similar to the widely used Gaussian Elimination based methods.
ABOUT THE AUTHOR(S)
Damien Mather Bio: Damien is a keen runner and Senior Lecturer in the Department of Marketing in the University of Otago Business School. He teaches a postgraduate course in predictive analytics and has had a persistent interest in statistics and computer science, especially FORTRAN, C/C++, SAS, and has App Dev Expertise in electromagnetic and electroacoustic dynamics, embedded systems design, modelling and visualisation and text analytics for the social sciences.
Chris Scott Bio: Chris is a keen swimrunner and Research Software Engineer working in NeSI’s Consultancy Service. He has a background in materials modelling and has many years’ experience in scientific computing and HPC.