Multimodal Deep Learning

Abstract

This book is the result of a seminar in which we reviewed multimodalapproaches and attempted to create a solid overview of the field, starting withthe current state-of-the-art approaches in the two subfields of Deep Learningindividually. Further, modeling frameworks are discussed where one modality istransformed into the other, as well as models in which one modality is utilizedto enhance representation learning for the other. To conclude the second part,architectures with a focus on handling both modalities simultaneously areintroduced. Finally, we also cover other modalities as well as general-purposemulti-modal models, which are able to handle different tasks on differentmodalities within one unified architecture. One interesting application(Generative Art) eventually caps off this booklet.

Quick Read (beta)

loading the full paper ...