An Effective Discriminative Learning Approach for Emotion-Specific Features Using Deep Neural Networks
Refereed conference paper presented and published in conference proceedings

Times Cited
Altmetrics Information

Other information
AbstractSpeech contains rich yet entangled information ranging from phonetic to emotional components. These different components are always mixed together hindering certain tasks from achieving better performance. Therefore, automatically learning a good representation that disentangles these components is non-trivial. In this paper, we propose a hierarchical method to extract utterance-level features from frame-level acoustic features using deep neural networks (DNNs). Moreover, inspired by recent progress in face recognition, we introduce centre loss as a complementary supervision signal to the traditional softmax loss to facilitate the intra-class compactness of the learned features. With the joint supervision of these two loss functions, we can train the DNNs to obtain separable and discriminative emotion-specific features. Experiments on CASIA corpus, Emo-DB corpus and SAVEE database show comparable results with that of state-of-the-art approaches.
Acceptance Date02/08/2018
All Author(s) ListShuiyang MAO, Pak Chung CHING
Name of Conference25th International Conference on Neural Information Processing ICONIP 2018
Start Date of Conference13/12/2018
End Date of Conference16/12/2018
Place of ConferenceSiem Reap
Country/Region of ConferenceCambodia
Proceedings TitleProceedings ICONIP 2018
Series TitleSpringer’s series of Lecture Notes in Computer Science
Volume NumberPart IV
PublisherSpringer Nature Switzerland
Place of PublicationSwitzerland
Article numberLNCS 11304
Pages50 - 61
LanguagesEnglish-United States
Keywordsspeech emotion recognition, deep neural network, hierarchical method, centre loss

Last updated on 2021-18-09 at 23:56