Multi-task learning of structured output layer bidirectional LSTMS for speech synthesis
Refereed conference paper presented and published in conference proceedings


Times Cited
Altmetrics Information
.

Other information
AbstractRecurrent neural networks (RNNs) and their bidirectional long short term memory (BLSTM) variants are powerful sequence modelling approaches. Their inherently strong ability in capturing long range temporal dependencies allow BLSTM-RNN speech synthesis systems to produce higher quality and smoother speech trajectories than conventional deep neural networks (DNNs). In this paper, we improve the conventional BLSTM-RNN based approach by introducing a multi-task learned structured output layer where spectral parameter targets are conditioned upon pitch parameters prediction. Both objective and subjective experimental results demonstrated the effectiveness of the proposed technique.
All Author(s) ListRunnan Li, Zhiyong Wu, Xunying, Liu, Helen Meng, Lianhong Cai
Name of Conference2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Start Date of Conference05/03/2017
End Date of Conference09/03/2017
Place of ConferenceNew Orleans, LA, USA
Country/Region of ConferenceUnited States of America
Proceedings Title2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Year2017
Month6
Pages5510 - 5514
ISBN978-1-5090-4116-9
eISBN978-1-5090-4117-6
eISSN2379-190X
LanguagesEnglish-United States
KeywordsSpeech, Hidden Markov models, Acoustics, Predictive models, Recurrent neural networks, Speech synthesis, Trajectory

Last updated on 2020-11-07 at 02:47