Speech recognition for medical conversations

Abstract

In this work we explored building automatic speech recognition models fortranscribing doctor patient conversation. We collected a large scale dataset ofclinical conversations ($14,000$ hr), designed the task to represent the realword scenario, and explored several alignment approaches to iteratively improvedata quality. We explored both CTC and LAS systems for building speechrecognition models. The LAS was more resilient to noisy data and CTC requiredmore data clean up. A detailed analysis is provided for understanding theperformance for clinical tasks. Our analysis showed the speech recognitionmodels performed well on important medical utterances, while errors occurred incausal conversations. Overall we believe the resulting models can providereasonable quality in practice.

Quick Read (beta)

loading the full paper ...