Slicing convolutional neural network for crowd video understanding
Refereed conference paper presented and published in conference proceedings


全文

其它資訊
摘要Learning and capturing both appearance and dynamic representations are pivotal for crowd video understanding. Convolutional Neural Networks (CNNs) have shown its remarkable potential in learning appearance representations from images. However, the learning of dynamic representation, and how it can be effectively combined with appearance features for video analysis, remains an open problem. In this study, we propose a novel spatio-temporal CNN, named Slicing CNN (S-CNN), based on the decomposition of 3D feature maps into 2D spatio- and 2D temporal-slices representations. The decomposition brings unique advantages: (1) the model is capable of capturing dynamics of different semantic units such as groups and objects, (2) it learns separated appearance and dynamic representations while keeping proper interactions between them, and (3) it exploits the selectiveness of spatial filters to discard irrelevant background clutter for crowd understanding. We demonstrate the effectiveness of the proposed S-CNN model on the WWW crowd video dataset for attribute recognition and observe significant performance improvements to the state-of-the-art methods (62.55% from 51.84% [21]).
著者Shao J., Loy C.C., Kang K., Wang X.
會議名稱2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
會議開始日26.06.2016
會議完結日01.07.2016
會議地點Las Vegas
會議國家/地區美國
詳細描述organized by IEEE,
出版年份2016
月份1
日期1
卷號2016-January
頁次5620 - 5628
國際標準書號9781467388511
國際標準期刊號1063-6919
語言英式英語

上次更新時間 2020-06-09 於 01:17