Workshop: Machine Learning on HPC Systems (MLHPCS)

View on GitHub

Distributed Deep Learning: Challenges & Opportunities

by Peter Labus form Fraunhofer ITWM

Abstract

In this talk, I will give an overview about the opportunities HPC can provide to enhance Deep Learning research. In particular, I will talk about distribution strategies for the training of Deep Neural Networks and challenges one faces when trying to implement them. I will conclude my talk by giving an outline of how we tackle these challenges within our own distributed Deep Learning framework Tarantella and show some recent benchmark results.

Slides

About the Speaker

Peter Labus studied theoretical physics in Berlin, Vienna and Munich. He received a Ph.D. in theoretical particle physics from the International School of Advanced Studies (SISSA) Trieste, Italy, from which he also received a Master’s degree in High Performance Computing. In 2018, he joined the Competence Center for High Performance Computing of Fraunhofer ITWM as a research scientist. As the lead developer of the Tarantella framework, he is particularly interested in scalable distributed Deep Learning. Since 2019 he is the head of the large scale machine learning team.