AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

  • 2025-11-03 16:38:43
  • Junan Zhang, Jing Yang, Zihao Fang, Yuancheng Wang, Zehua Zhang, Zhuo Wang, Fan Fan, Zhizheng Wu
  • 0

Abstract

We introduce AnyEnhance, a unified generative model for voice enhancementthat processes both speech and singing voices. Based on a masked generativemodel, AnyEnhance is capable of handling both speech and singing voices,supporting a wide range of enhancement tasks including denoising,dereverberation, declipping, super-resolution, and target speaker extraction,all simultaneously and without fine-tuning. AnyEnhance introduces aprompt-guidance mechanism for in-context learning, which allows the model tonatively accept a reference speaker's timbre. In this way, it could boostenhancement performance when a reference audio is available and enable thetarget speaker extraction task without altering the underlying architecture.Moreover, we also introduce a self-critic mechanism into the generative processfor masked generative models, yielding higher-quality outputs through iterativeself-assessment and refinement. Extensive experiments on various enhancementtasks demonstrate AnyEnhance outperforms existing methods in terms of bothobjective metrics and subjective listening tests. Demo audios are publiclyavailable at https://amphionspace.github.io/anyenhance. An open-sourceimplementation is provided athttps://github.com/viewfinder-annn/anyenhance-v1-ccf-aatc.

 

Quick Read (beta)

loading the full paper ...