Stable Audio Open

  • 2024-07-31 17:22:42
  • Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons
  • 0

Abstract

Open generative models are vitally important for the community, allowing forfine-tunes and serving as baselines when presenting new models. However, mostcurrent text-to-audio models are private and not accessible for artists andresearchers to build upon. Here we describe the architecture and trainingprocess of a new open-weights text-to-audio model trained with Creative Commonsdata. Our evaluation shows that the model's performance is competitive with thestate-of-the-art across various metrics. Notably, the reported FDopenl3 results(measuring the realism of the generations) showcase its potential forhigh-quality stereo sound synthesis at 44.1kHz.

 

Quick Read (beta)

loading the full paper ...