Abstract
We present a number of systems for the Voice Privacy Challenge, includingvoice conversion based systems such as the kNN-VC method and the WavLM voiceConversion method, and text-to-speech (TTS) based systems includingWhisper-VITS. We found that while voice conversion systems better preserveemotional content, they struggle to conceal speaker identity in semi-white-boxattack scenarios; conversely, TTS methods perform better at anonymization andworse at emotion preservation. Finally, we propose a random admixture systemwhich seeks to balance out the strengths and weaknesses of the two category ofsystems, achieving a strong EER of over 40% while maintaining UAR at arespectable 47%.
Quick Read (beta)
loading the full paper ...