Abstract
With robotics rapidly advancing, more effective human-robot interaction isincreasingly needed to realize the full potential of robots for society. Whilespoken language must be part of the solution, our ability to provide spokenlanguage interaction capabilities is still very limited. The National ScienceFoundation accordingly convened a workshop, bringing together speech, language,and robotics researchers to discuss what needs to be done. The result is thisreport, in which we identify key scientific and engineering advances needed. Our recommendations broadly relate to eight general themes. First, meetinghuman needs requires addressing new challenges in speech technology and userexperience design. Second, this requires better models of the social andinteractive aspects of language use. Third, for robustness, robots needhigher-bandwidth communication with users and better handling of uncertainty,including simultaneous consideration of multiple hypotheses and goals. Fourth,more powerful adaptation methods are needed, to enable robots to communicate innew environments, for new tasks, and with diverse user populations, withoutextensive re-engineering or the collection of massive training data. Fifth,since robots are embodied, speech should function together with othercommunication modalities, such as gaze, gesture, posture, and motion. Sixth,since robots operate in complex environments, speech components need access torich yet efficient representations of what the robot knows about objects,locations, noise sources, the user, and other humans. Seventh, since robotsoperate in real time, their speech and language processing components mustalso. Eighth, in addition to more research, we need more work on infrastructureand resources, including shareable software modules and internal interfaces,inexpensive hardware, baseline systems, and diverse corpora.