Teaching Computers to Imagine with Deep Generative Models

Sergey Tulyakov, Stéphane Lathuilière

November 19 - 26, 2019

The University of Trento, Italy

Abstract

Recent methods in computer vision can be roughly categorized as those that provide some decision given an input image or a video. Such decision includes the number of objects in the input, their type (ie car, tree, etc). In other words, they provide some sort of labelling capabilities. We term such methods as discriminative. Another group of methods, termed generative models, models the distribution of inputs. Such techniques offer generative capabilities, given some input such methods can generate an image, video, audio or text. Moreover, these methods can be conditioned on user input offering some sort of control on what is being generated. This control includes changing a particular attribute of an image, while keeping other attributes unchanged, such as summer to winter, male to female, smiling to non-smiling face. For humans, changing an attribute requires careful training, specialized software and is time consuming. Therefore, such capabilities can be considered as a form of learned imagination. Do to the ability to “imagine” generative techniques have been widely used in a variety of applications: image synthesis, style transfer, image-to-image translation, video synthesis and retargeting. Such models are used to enhance discriminative techniques with unlabelled or synthetic data, learn to reconstruct 3D when 3D labels are not available.

Focus: generative models in deep learning and their applications to image and video manipulation, translation as well as methods capitalizing upon such models to perform discriminative tasks.

Prerequisites: this course requires an understanding of the basic building blocks of convolutional neural networks and machine learning theory, including standard deep learning architectures, cost functions, activations and learning paradigms. Therefore, for PhD students that have not already worked on deep neural networks an introductory course such as Introduction to Deep Learning or Deep Learning for Image Processing is highly recommended.

November 19

9:30-11:30 Introduction to the course. Introduction to Deep Learning Slides
Generative Adversarial Networks Slides
14:00-16:00 Variational Autoencoders Slides
Derivations References

November 20

10:00-12:00 Generative adversarial networks continued. Image-to-image translation Slides
14:00-16:00 Practical session on GANs Colab

November 21

10:00-12:00 Pose-guided generation Slides
Video synthesis: generation, prediction, translation, retargeting Slides
Gradient-based style-transfer and adversarial examples Slides
14:00-16:00 Practical session on VAEs Colab

November 22

10:00-12:00 Deep Fakes Slides
Improving discriminative models Slides
Challenge: extending MoCoGAN Slides
Colab