Move Forward and Tell: A Progressive Generator of Video Descriptions
Refereed conference paper presented and published in conference proceedings

替代計量分析
.

其它資訊
摘要We present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips. They typically treat an entire video as a whole and generate the caption conditioned on a single embedding. On the contrary, we consider videos with rich temporal structures and aim to generate paragraph descriptions that can preserve the story flow while being coherent and concise. Towards this goal, we propose a new approach, which produces a descriptive paragraph by assembling temporally localized descriptions. Given a video, it selects a sequence of distinctive clips and generates sentences thereon in a coherent manner. Particularly, the selection of clips and the production of sentences are done jointly and progressively driven by a recurrent network – what to describe next depends on what have been said before. Here, the recurrent network is learned via self-critical sequence training with both sentence-level and paragraph-level rewards. On the ActivityNet Captions dataset, our method demonstrated the capability of generating high-quality paragraph descriptions for videos. Compared to those by other methods, the descriptions produced by our method are often more relevant, more coherent, and more concise.
出版社接受日期12.07.2018
著者Yilei Xiong, Bo Dai, Dahua Lin
會議名稱15th European Conference on Computer Vision, ECCV 2018
會議開始日08.09.2018
會議完結日14.09.2018
會議地點Munich, Germany
會議國家/地區德國
會議論文集題名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
出版年份2018
月份9
卷號11215
出版社Springer
頁次489 - 505
國際標準書號978-303001251-9
國際標準期刊號03029743
語言美式英語

上次更新時間 2021-22-01 於 01:54