Find and Focus: Retrieve and Localize Video Events with Natural Language Queries
Refereed conference paper presented and published in conference proceedings

香港中文大學研究人員
替代計量分析
.

其它資訊
摘要The thriving of video sharing services brings new challenges to video retrieval, e.g. the rapid growth in video duration and content diversity. Meeting such challenges calls for new techniques that can effectively retrieve videos with natural language queries. Existing methods along this line, which mostly rely on embedding videos as a whole, remain far from satisfactory for real-world applications due to the limited expressive power. In this work, we aim to move beyond this limitation by delving into the internal structures of both sides, the queries and the videos. Specifically, we propose a new framework called Find and Focus (FIFO), which not only performs top-level matching (paragraph vs. video), but also makes part-level associations, localizing a video clip for each sentence in the query with the help of a focusing guide. These levels are complementary – the top-level matching narrows the search while the part-level localization refines the results. On both ActivityNet Captions and modified LSMDC datasets, the proposed framework achieves remarkable performance gains (Project Page: https://ycxioooong.github.io/projects/fifo).
出版社接受日期12.07.2018
著者Dian Shao, Yu Xiong, Yue Zhao, Qingqiu Huang, Yu Qiao, Dahua Lin
會議名稱15th European Conference on Computer Vision, ECCV 2018
會議開始日08.09.2018
會議完結日14.09.2018
會議地點Munich, Germany
會議國家/地區德國
會議論文集題名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
出版年份2018
月份9
卷號11213
出版社Springer
頁次202 - 218
國際標準書號978-303001239-7
國際標準期刊號03029743
語言美式英語

上次更新時間 2021-22-01 於 01:54