Novel-View Acoustic Synthesis

Abstract

We introduce the novel-view acoustic synthesis (NVAS) task: given the sightand sound observed at a source viewpoint, can we synthesize the \emph{sound} ofthat scene from an unseen target viewpoint? We propose a neural renderingapproach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns tosynthesize the sound of an arbitrary point in space by analyzing the inputaudio-visual cues. To benchmark this task, we collect two first-of-their-kindlarge-scale multi-view audio-visual datasets, one synthetic and one real. Weshow that our model successfully reasons about the spatial cues and synthesizesfaithful audio on both datasets. To our knowledge, this work represents thevery first formulation, dataset, and approach to solve the novel-view acousticsynthesis task, which has exciting potential applications ranging from AR/VR toart and design. Unlocked by this work, we believe that the future of novel-viewsynthesis is in multi-modal learning from videos.

Quick Read (beta)

loading the full paper ...