Trajectory Convolution for Action Recognition
Refereed conference paper presented and published in conference proceedings

香港中文大學研究人員

全文

其它資訊
摘要How to leverage the temporal dimension is one major question in video analysis. Recent works [47, 36] suggest an efficient approach to video feature learning, i.e., factorizing 3D convolutions into separate components respectively for spatial and temporal convolutions. The temporal convolution, however, comes with an implicit assumption - the feature maps across time steps are well aligned so that the features at the same locations can be aggregated. This assumption can be overly strong in practical applications, especially in action recognition where the motion serves as a crucial cue. In this work, we propose a new CNN architecture TrajectoryNet, which incorporates trajectory convolution, a new operation for integrating features along the temporal dimension, to replace the existing temporal convolution. This operation explicitly takes into account the changes in contents caused by deformation or motion, allowing the visual features to be aggregated along the the motion paths, trajectories. On two large-scale action recognition datasets, Something-Something V1 and Kinetics, the proposed network architecture achieves notable improvement over strong baselines.
出版社接受日期05.09.2018
著者Yue Zhao, Yuanjun Xiong, Dahua Lin
會議名稱32nd Conference on Neural Information Processing Systems (NIPS)
會議開始日02.12.2018
會議完結日08.12.2018
會議地點Montreal
會議國家/地區加拿大
會議論文集題名Advances in Neural Information Processing Systems
出版年份2018
月份12
國際標準期刊號1049-5258
語言美式英語

上次更新時間 2021-21-01 於 02:20