The University of Auckland
Browse

Parallel Computing with Dask

Download (808.1 kB)
presentation
posted on 2020-12-07, 21:54 authored by Wolfgang Hayek
<div>Parallel computing has become a necessity for a wide range of modern scientific computing problems, including data-oriented computing at large scale to achieve reasonable processing times. Implementing parallel computing can be challenging and time-consuming - APIs such as the Message Passing Interface (MPI) are powerful but can be hard to learn and implement. <br></div><div><br></div><div>Dask is a popular toolkit for the Python programming language that addresses this issue. While requiring very little programming effort, it offers a variety of parallelisation paradigms, including work sharing via parallel function evaluation, task graphs, and direct integration with packages such as NumPy, Pandas, and Scikit-Learn. Dask can be used interactively and as a batch processing tool. The Dask-MPI package adds MPI as a parallelisation backend, enabling scalability and high throughput on high-performance computing (HPC) systems. <br></div><div><br></div><div>In this presentation, I will introduce Dask, discuss some of its parallelisation mechanisms, and demonstrate how to use the MPI backend for batch processing. <br></div><div><br></div><div>[Note: This presentation should precede Maxime Rio’s demo of using Dask with SciKit-learn in Jupyter notebooks as it will cover off the basics of Dask.] </div><div><br></div><div><b>ABOUT THE AUTHOR(S)</b></div><div>Wolfgang Hayek is a research software engineer at NeSI and NIWA, and group manager of NIWA’s scientific programming group, with many years of experience in scientific computing and HPC. <b><br></b></div><div><br></div><div>Maxime is a data scientist at NeSI and NIWA. He enjoys helping researchers to analyse their data, from visualisation to probabilistic modelling. <br></div><div><br></div><div>Chris Scott is a Research Software Engineer at NeSI with a background in scientific computing and HPC.<br></div>

History

Publisher

New Zealand eScience Infrastructure

Usage metrics

    NeSI

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC