Classification errors distort findings in automated speech processing: examples and solutions from child-development research

  • 2025-08-21 15:02:13
  • Lucas Gautheron, Evan Kidd, Anton Malko, Marvin Lavechin, Alejandrina Cristia
  • 0

Abstract

With the advent of wearable recorders, scientists are increasingly turning toautomated methods of analysis of audio and video data in order to measurechildren's experience, behavior, and outcomes, with a sizable literatureemploying long-form audio-recordings to study language acquisition. Whilenumerous articles report on the accuracy and reliability of the most popularautomated classifiers, less has been written on the downstream effects ofclassification errors on measurements and statistical inferences (e.g., theestimate of correlations and effect sizes in regressions). This paper proposesa Bayesian approach to study the effects of algorithmic errors on keyscientific questions, including the effect of siblings on children's languageexperience and the association between children's production and their input.In both the most commonly used \gls{lena}, and an open-source alternative (theVoice Type Classifier from the ACLEW system), we find that classificationerrors can significantly distort estimates. For instance, automated annotationsunderestimated the negative effect of siblings on adult input by 20--80\%,potentially placing it below statistical significance thresholds. We furthershow that a Bayesian calibration approach for recovering unbiased estimates ofeffect sizes can be effective and insightful, but does not provide a fool-proofsolution. Both the issue reported and our solution may apply to any classifierinvolving event detection and classification with non-zero error rates.

 

Quick Read (beta)

loading the full paper ...