Referring Expressions with Rational Speech Act Framework: A Probabilistic Approach

  • 2022-05-16 17:37:50
  • Hieu Le, Taufiq Daryanto, Fabian Zhafransyah, Derry Wijaya, Elizabeth Coppock, Sang Chin
  • 2

Abstract

This paper focuses on a referring expression generation (REG) task in whichthe aim is to pick out an object in a complex visual scene. One commontheoretical approach to this problem is to model the task as a two-agentcooperative scheme in which a `speaker' agent would generate the expressionthat best describes a targeted area and a `listener' agent would identify thetarget. Several recent REG systems have used deep learning approaches torepresent the speaker/listener agents. The Rational Speech Act framework (RSA),a Bayesian approach to pragmatics that can predict human linguistic behaviorquite accurately, has been shown to generate high quality and explainableexpressions on toy datasets involving simple visual scenes. Its application tolarge scale problems, however, remains largely unexplored. This paper applies acombination of the probabilistic RSA framework and deep learning approaches tolarger datasets involving complex visual scenes in a multi-step process withthe aim of generating better-explained expressions. We carry out experiments onthe RefCOCO and RefCOCO+ datasets and compare our approach with otherend-to-end deep learning approaches as well as a variation of RSA to highlightour key contribution. Experimental results show that while achieving loweraccuracy than SOTA deep learning methods, our approach outperforms similar RSAapproach in human comprehension and has an advantage over end-to-end deeplearning under limited data scenario. Lastly, we provide a detailed analysis onthe expression generation process with concrete examples, thus providing asystematic view on error types and deficiencies in the generation process andidentifying possible areas for future improvements.

 

Quick Read (beta)

loading the full paper ...