An Effective Discriminative Learning Approach for Emotion-Specific Features Using Deep Neural Networks
Refereed conference paper presented and published in conference proceedings

香港中文大學研究人員
替代計量分析
.

其它資訊
摘要Speech contains rich yet entangled information ranging from phonetic to emotional components. These different components are always mixed together hindering certain tasks from achieving better performance. Therefore, automatically learning a good representation that disentangles these components is non-trivial. In this paper, we propose a hierarchical method to extract utterance-level features from frame-level acoustic features using deep neural networks (DNNs). Moreover, inspired by recent progress in face recognition, we introduce centre loss as a complementary supervision signal to the traditional softmax loss to facilitate the intra-class compactness of the learned features. With the joint supervision of these two loss functions, we can train the DNNs to obtain separable and discriminative emotion-specific features. Experiments on CASIA corpus, Emo-DB corpus and SAVEE database show comparable results with that of state-of-the-art approaches.
出版社接受日期02.08.2018
著者Shuiyang MAO, Pak Chung CHING
會議名稱25th International Conference on Neural Information Processing ICONIP 2018
會議開始日13.12.2018
會議完結日16.12.2018
會議地點Siem Reap
會議國家/地區柬埔寨
會議論文集題名Proceedings ICONIP 2018
系列標題Springer’s series of Lecture Notes in Computer Science
出版年份2018
月份12
日期13
卷號Part IV
出版社Springer Nature Switzerland
出版地Switzerland
文章號碼LNCS 11304
頁次50 - 61
國際標準書號978-3-030-04212-7
語言美式英語
關鍵詞speech emotion recognition, deep neural network, hierarchical method, centre loss

上次更新時間 2020-12-10 於 00:49