Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Apr 11, 2018

Dianqi Li, Qiuyuan Huang, Xiaodong He, Lei Zhang, Ming-Ting Sun

Figure 1 for Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Figure 2 for Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Figure 3 for Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Figure 4 for Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Share this with someone who'll enjoy it:

Abstract:We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machine-generated captions, despite phenomenal research progresses in the past several years, are expressed in a very monotonic and featureless format. While such captions are normally accurate, they often lack important characteristics in human languages - distinctiveness for each caption and diversity for different images. To address this problem, we propose a novel conditional generative adversarial network for generating diverse captions across images. Instead of estimating the quality of a caption solely on one image, the proposed comparative adversarial learning framework better assesses the quality of captions by comparing a set of captions within the image-caption joint space. By contrasting with human-written captions and image-mismatched captions, the caption generator effectively exploits the inherent characteristics of human languages, and generates more discriminative captions. We show that our proposed network is capable of producing accurate and diverse captions across images.

View paper on

Share this with someone who'll enjoy it:

Title:Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Paper and Code