Brain-tuned Speech Models Better Reflect Speech Processing Stages in the Brain

Abstract

Pretrained self-supervised speech models excel in speech tasks but do notreflect the hierarchy of human speech processing, as they encode rich semanticsin middle layers and poor semantics in late layers. Recent work showed thatbrain-tuning (fine-tuning models using human brain recordings) improves speechmodels' semantic understanding. Here, we examine how well brain-tuned modelsfurther reflect the brain's intermediate stages of speech processing. We findthat late layers of brain-tuned models substantially improve over pretrainedmodels in their alignment with semantic language regions. Further layer-wiseprobing reveals that early layers remain dedicated to low-level acousticfeatures, while late layers become the best at complex high-level tasks. Thesefindings show that brain-tuned models not only perform better but also exhibita well-defined hierarchical processing going from acoustic to semanticrepresentations, making them better model organisms for human speechprocessing.

Quick Read (beta)

loading the full paper ...