Synthetic Data Generation for Phrase Break Prediction with Large Language Model

Abstract

Current approaches to phrase break prediction address crucial prosodicaspects of text-to-speech systems but heavily rely on vast human annotationsfrom audio or text, incurring significant manual effort and cost. Inherentvariability in the speech domain, driven by phonetic factors, furthercomplicates acquiring consistent, high-quality data. Recently, large languagemodels (LLMs) have shown success in addressing data challenges in NLP bygenerating tailored synthetic data while reducing manual annotation needs.Motivated by this, we explore leveraging LLM to generate synthetic phrase breakannotations, addressing the challenges of both manual annotation andspeech-related tasks by comparing with traditional annotations and assessingeffectiveness across multiple languages. Our findings suggest that LLM-basedsynthetic data generation effectively mitigates data challenges in phrase breakprediction and highlights the potential of LLMs as a viable solution for thespeech domain.

Quick Read (beta)

loading the full paper ...