MoFAP: A Multi-level Representation for Action Recognition
Publication in refereed journal

香港中文大學研究人員
替代計量分析
.

其它資訊
摘要This paper proposes a multi-level video representation by stacking the activations of motion features, atoms, and phrases (MoFAP). Motion features refer to those low-level local descriptors, while motion atoms and phrases can be viewed as mid-level “temporal parts”. Motion atom is defined as an atomic part of action, and captures the motion information of video in a short temporal scale. Motion phrase is a temporal composite of multiple motion atoms defined with an AND/OR structure. It further enhances the discriminative capacity of motion atoms by incorporating temporal structure in a longer temporal scale. Specifically, we first design a discriminative clustering method to automatically discover a set of representative motion atoms. Then, we mine effective motion phrases with high discriminative and representative capacity in a bottom-up manner. Based on these basic units of motion features, atoms, and phrases, we construct a MoFAP network by stacking them layer by layer. This MoFAP network enables us to extract the effective representation of video data from different levels and scales. The separate representations from motion features, motion atoms, and motion phrases are concatenated as a whole one, called Activation of MoFAP. The effectiveness of this representation is demonstrated on four challenging datasets: Olympic Sports, UCF50, HMDB51, and UCF101. Experimental results show that our representation achieves the state-of-the-art performance on these datasets.
出版社接受日期21.09.2015
著者Limin Wang, Yu Qiao, Xiaoou Tang
期刊名稱International Journal of Computer Vision
出版年份2016
月份9
卷號119
期次3
出版社Springer
頁次254 - 271
國際標準期刊號0920-5691
電子國際標準期刊號1573-1405
語言美式英語
關鍵詞Action recognition, Motion Feature, Motion Atom, Motion Phrase

上次更新時間 2020-04-08 於 03:04