Abstract
Speaker attribution from speech transcripts is the task of identifying aspeaker from the transcript of their speech based on patterns in their languageuse. This task is especially useful when the audio is unavailable (e.g.deleted) or unreliable (e.g. anonymized speech). Prior work in this area hasprimarily focused on the feasibility of attributing speakers using transcriptsproduced by human annotators. However, in real-world settings, one often onlyhas more errorful transcripts produced by automatic speech recognition (ASR)systems. In this paper, we conduct what is, to our knowledge, the firstcomprehensive study of the impact of automatic transcription on speakerattribution performance. In particular, we study the extent to which speakerattribution performance degrades in the face of transcription errors, as wellas how properties of the ASR system impact attribution. We find thatattribution is surprisingly resilient to word-level transcription errors andthat the objective of recovering the true transcript is minimally correlatedwith attribution performance. Overall, our findings suggest that speakerattribution on more errorful transcripts produced by ASR is as good, if notbetter, than attribution based on human-transcribed data, possibly because ASRtranscription errors can capture speaker-specific features revealing of speakeridentity.