Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized accelerators, or run remotely where resources are available. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most suitable resources. To address these needs we present funcX—a distributed function as a service (FaaS) platform that enables flexible, scalable, and high-performance remote function execution. funcX’s endpoint software can transform existing clouds, clusters, and supercomputers into function serving systems, while funcX’s cloud-hosted service provides transparent, secure, and reliable function execution across a federated ecosystem of endpoints. We motivate the need for funcX with several scientific case studies, present our prototype design and implementation, show optimizations that deliver throughput in excess of 1 million functions per second, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than more than 130 000 concurrent workers.
ABOUT THE AUTHOR(S)
Kyle Chard is a Research Assistant Professor in the Department of Computer Science at the University of Chicago and a joint appointee at Argonne National Laboratory. He received his Ph.D. in Computer Science from Victoria University of Wellington, New Zealand. Kyle received the IEEE TCHPC Award for Excellence for Early Career Researchers in HPC, was part of the Globus team that won an R&D100 award, and was awarded a New Zealand Top Achiever Doctoral Scholarship. He co leads the Globus Labs research group which focuses on a broad range of research problems in data intensive computing and research data management. He leads NSF- and DOE-funded projects related to distributed and parallel computing, scientific reproducibility, research automation, and cost-aware use of cloud infrastructure.