Synthesizing Audio with Generative Adversarial Networks

Abstract

While Generative Adversarial Networks (GANs) have seen wide success at theproblem of synthesizing realistic images, they have seen little application tothe problem of unsupervised audio generation. Unlike for images, a barrier tosuccess is that the best discriminative representations for audio tend to benon-invertible, and thus cannot be used to synthesize listenable outputs. Inthis paper, we introduce WaveGAN, a first attempt at applying GANs to raw audiosynthesis in an unsupervised setting. Our experiments on speech demonstratethat WaveGAN can produce intelligible words from a small vocabulary of humanspeech, as well as synthesize audio from other domains such as birdvocalizations, drums, and piano. Qualitatively, we find that human judgesprefer the generated examples from WaveGAN over those from a method whichnaively apply GANs on image-like audio feature representations.

Quick Read (beta)

loading the full paper ...