Towards Diverse and Natural Image Descriptions via a Conditional GAN
Refereed conference paper presented and published in conference proceedings

香港中文大學研究人員
替代計量分析
.

其它資訊
摘要Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect.Sentences produced by existing methods, e.g. those based on RNNs, are often overly rigid and lacking in variability. This issue is related to a learning principle widely used in practice, that is, to maximize the likelihood of training samples. This principle encourages high resemblance to the "ground-truth" captions while suppressing other reasonable descriptions. Conventional evaluation metrics, e.g. BLEU and METEOR, also favor such restrictive methods. In this paper, we explore an alternative approach, with the aim to improve the naturalness and diversity -- two essential properties of human expression. Specifically, we propose a new framework based on Conditional Generative Adversarial Networks (CGAN), which jointly learns a generator to produce descriptions conditioned on images and an evaluator to assess how well a description fits the visual content. It is noteworthy that training a sequence generator is nontrivial. We overcome the difficulty by Policy Gradient, a strategy stemming from Reinforcement Learning, which allows the generator to receive early feedback along the way. We tested our method on two large datasets, where it performed competitively against real people in our user study and outperformed other methods on various tasks.
著者Bo Dai, Sanja Fidler, Raquel Urtasun, Dahua Lin
會議名稱16th IEEE International Conference on Computer Vision (ICCV)
會議開始日22.10.2017
會議完結日29.10.2017
會議地點Venice
會議國家/地區意大利
會議論文集題名2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
出版年份2017
月份10
出版社IEEE
頁次2989 - 2998
國際標準書號978-1-5386-1032-9
國際標準期刊號1550-5499
語言美式英語

上次更新時間 2021-19-01 於 01:00