Abstract
Grapheme-to-phoneme (G2P) conversion for Persian presents unique challengesdue to its complex phonological features, particularly homographs and Ezafe,which exist in formal and informal language contexts. This paper introduces anintermediate language specifically designed for Persian language processingthat addresses these challenges through a multi-faceted approach. Ourmethodology combines two key components: Large Language Model (LLM) promptingtechniques and a specialized sequence-to-sequence machine transliterationarchitecture. We developed and implemented a systematic approach forconstructing a comprehensive lexical database for homographs with multiplepronunciations disambiguation often termed polyphones, utilizing formal conceptanalysis for semantic differentiation. We train our model using two distinctdatasets: the LLM-generated dataset for formal and informal Persian and theB-Plus podcasts for informal language variants. The experimental resultsdemonstrate superior performance compared to existing state-of-the-artapproaches, particularly in handling the complexities of Persian phonemeconversion. Our model significantly improves Phoneme Error Rate (PER) metrics,establishing a new benchmark for Persian G2P conversion accuracy. This workcontributes to the growing research in low-resource language processing andprovides a robust solution for Persian text-to-speech systems and demonstratingits applicability beyond Persian. Specifically, the approach can extend tolanguages with rich homographic phenomena such as Chinese and Arabic