Workshop: Machine Learning on HPC Systems (MLHPCS)

View on GitHub

Impact of large-scale pre-training on intra- and inter-domain transfer learning in full and few-shot regimes

Abstract

Transfer learning aims on exploiting models pre-trained on large amounts of source data for re-use on wide range of target downstream tasks and dataset, and it has been successfully employed to enable training with small target data sizes. Recent line of work posits strong benefits for model generalization and transfer when model size, data size and compute budget are increased for the pre-training. It remains however still largely unclear how the observed transfer improvement due to increase in scale also depends on degree to which source and target datasets are related to each other. We will review recent evidence of large-scale pre-training impact on full and few-shot transfer learning in intra- and inter-domain scenarios, motivating necessity for systematic experiments that may deliver scaling laws for transfer performance dependent on model, data size, compute budget, composition of large source dataset used for pre-training and degree of alignment between source and target datasets. Such experiments require vast compute resources and proper utilization of supercomputing facilities. As an outlook, we will introduce COVIDNetX initiative that aims on studying large-scale intra- and inter-domain transfer learning in a specific use case where relevant pattern detection is performed on target medical imaging datasets that have much smaller size than large source data used during pre-training.

Speaker

Jenia Jitsev, Juelich Supercomputer Center, Helmholtz AI, Research Center Juelich

Jenia is a senior researcher at Juelich Supercomputing Center (JSC), leading a lab that deals with large-scale transferable deep learning. His background is in machine learning, neuroscience and computer science, with research work evolving in the overlap of machine learning and computational neuroscience. He did his PhD with Prof. von der Malsburg on unsupervised learning and self-organization in the hierarchical recurrent networks of the visual cortex, with applications to face and object recognition. During his postdoc, he did work on models of reward-based reinforcement learning in the cortico-basal gangia brain network, for which he received Best Paper Award from IEEE and the International Society of Neural Networks. His current research focus is on various types of large-scale neural network training to obtain generic models that can be efficiently transferred across various datasets, domains and tasks. To enable this kind of large-scale learning and transfer experiments, he also deals with distributed training of deep learning models across multiple GPUs or other accelerators on supercomputers like JUWELS Booster at JSC.