Deep Neural Network Based Acoustic-to-articulatory Inversion Using Phone Sequence Information
Refereed conference paper presented and published in conference proceedings

CUHK Authors
Author(s) no longer affiliated with CUHK

Times Cited
Altmetrics Information

Other information
AbstractIn recent years, neural network based acoustic-to-articulatory inversion approaches have achieved the state-of-the-art performance. One major issue associated with these approaches is the lack of phone sequence information during inversion. In order to address this issue, this paper proposes an improved architecture hierarchically concatenating phone classification and articulatory inversion component DNNs to improve articulatory movement generation. On a Mandarin Chinese speech inversion task, the proposed technique consistently outperformed a range of baseline DNN and RNN inversion systems constructed using no phone sequence information, a mixture density parameter output layer, additional phone features at the input layer, or multi-task learning with additional monophone output layer target labels, measured in terms of electromagnetic articulography (EMA) root mean square error (RMSE) and correlation. Further improvements were obtained using the bottleneck features extracted from the proposed hierarchical articulatory inversion systems as auxiliary features in generalized variable parameter HMMs (GVP-HMMs) based inversion systems.
All Author(s) ListXurong Xie, Xunying Liu, Lan Wang
Name of ConferenceISCA Interspeech2016
Start Date of Conference08/09/2016
End Date of Conference12/09/2016
Place of ConferenceSan Francisco, CA, USA
Country/Region of ConferenceUnited States of America
Proceedings TitleInterspeech
Pages1497 - 1501
LanguagesEnglish-United States

Last updated on 2021-24-09 at 23:44