Generating Multi-Sentence Lingual Descriptions of Indoor Scenes
Refereed conference paper presented and published in conference proceedings

香港中文大學研究人員

全文

其它資訊
摘要This paper proposes a novel framework for generating lingual descriptions of indoor scenes. Whereas substantial efforts have been made to tackle this problem, previous approaches focusing primarily on generating a single sentence for each image, which is not sufficient for describing complex scenes. We attempt to go beyond this, by generating coherent descriptions with multiple sentences. Our approach is distinguished from conventional ones in several aspects: (1) a 3D visual parsing system that jointly infers objects, attributes, and relations; (2) a generative grammar learned automatically from training text; and (3) a text generation algorithm that takes into account the coherence among sentences. Experiments on the augmented NYU-v2 dataset show that our framework can generate natural descriptions with substantially higher ROGUE scores compared to those produced by the baseline.
著者LIN Dahua, FIDLER Sanja, CHEN Kong, URTASUN Raquel
會議名稱British Machine Vision Conference
會議開始日07.09.2015
會議完結日10.09.2015
會議地點Swansea
會議國家/地區英國
會議論文集題名British Machine Vision Conference
出版年份2015
月份9
頁次13
語言英式英語

上次更新時間 2018-23-01 於 03:17