NUTSHELL: A Dataset for Abstract Generation from Scientific Talks

  • 2025-06-02 08:51:11
  • Maike Züfle, Sara Papi, Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Jan Niehues
  • 0

Abstract

Scientific communication is receiving increasing attention in naturallanguage processing, especially to help researches access, summarize, andgenerate content. One emerging application in this area is Speech-to-AbstractGeneration (SAG), which aims to automatically generate abstracts from recordedscientific presentations. SAG enables researchers to efficiently engage withconference talks, but progress has been limited by a lack of large-scaledatasets. To address this gap, we introduce NUTSHELL, a novel multimodaldataset of *ACL conference talks paired with their corresponding abstracts. Weestablish strong baselines for SAG and evaluate the quality of generatedabstracts using both automatic metrics and human judgments. Our resultshighlight the challenges of SAG and demonstrate the benefits of training onNUTSHELL. By releasing NUTSHELL under an open license (CC-BY 4.0), we aim toadvance research in SAG and foster the development of improved models andevaluation methods.

 

Quick Read (beta)

loading the full paper ...