Action recognition with trajectory-pooled deep-convolutional descriptors
Refereed conference paper presented and published in conference proceedings

香港中文大學研究人員
替代計量分析
.

其它資訊
摘要Visual features are of vital importance for human action understanding in videos. This paper presents a new video representation, called trajectory-pooled deep-convolutional descriptor (TDD), which shares the merits of both hand-crafted features [31] and deep-learned features [24]. Specifically, we utilize deep architectures to learn discriminative convolutional feature maps, and conduct trajectory-constrained pooling to aggregate these convolutional features into effective descriptors. To enhance the robustness of TDDs, we design two normalization methods to transform convolutional feature maps, namely spatiotemporal normalization and channel normalization. The advantages of our features come from (i) TDDs are automatically learned and contain high discriminative capacity compared with those hand-crafted features; (ii) TDDs take account of the intrinsic characteristics of temporal dimension and introduce the strategies of trajectory-constrained sampling and pooling for aggregating deep-learned features. We conduct experiments on two challenging datasets: HMD-B51 and UCF101. Experimental results show that TDDs outperform previous hand-crafted features [31] and deep-learned features [24]. Our method also achieves superior performance to the state of the art on these datasets.
著者Wang L., Qiao Y., Tang X.
會議名稱IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
會議開始日07.06.2015
會議完結日12.06.2015
會議地點Boston
會議國家/地區美國
詳細描述organized by IEEE,
出版年份2015
月份10
日期14
卷號07-12-June-2015
頁次4305 - 4314
國際標準書號9781467369640
國際標準期刊號1063-6919
語言英式英語

上次更新時間 2020-31-07 於 23:10