Image Captioning Based on a Hierarchical Attention Mechanism and Policy Gradient Optimization

  • 2019-01-11 03:31:31
  • Shiyang Yan, Yuan Xie, Fangyu Wu, Jeremy S. Smith, Wenjin Lu, Bailing Zhang
  • 0

Abstract

Automatically generating the descriptions of an image, i.e., imagecaptioning, is an important and fundamental topic in artificial intelligence,which bridges the gap between computer vision and natural language processing.Based on the successful deep learning models, especially the CNN model and LongShort-Term Memories (LSTMs) with attention mechanism, we propose a hierarchicalattention model by utilizing both of the global CNN features and the localobject features for more effective feature representation and reasoning inimage captioning. The generative adversarial network (GAN), together with areinforcement learning (RL) algorithm, is applied to solve the exposure biasproblem in RNN-based supervised training for language problems. In addition,through the automatic measurement of the consistency between the generatedcaption and the image content by the discriminator in the GAN framework and RLoptimization, we make the finally generated sentences more accurate andnatural. Comprehensive experiments show the improved performance of thehierarchical attention mechanism and the effectiveness of our RL-basedoptimization method. Our model achieves state-of-the-art results on severalimportant metrics in the MSCOCO dataset, using only greedy inference.

 

Quick Read (beta)

loading the full paper ...