Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer
Refereed conference paper presented and published in conference proceedings

Times Cited
Altmetrics Information

Other information
AbstractProsodic structure generation from text plays an important role in Chinese text-to-speech (TTS) synthesis, which greatly influences the naturalness and intelligibility of the synthesized speech. This paper proposes a multi-task learning method for prosodic structure generation using bidirectional long shortterm memory (BLSTM) recurrent neural network (RNN) and structured output layer (SOL). Unlike traditional methods where prerequisites such as lexicon word or even syntactic tree are usually required as the input, the proposed method predicts prosodic boundary labels directly from Chinese characters. BLSTM RNN is used to capture the bidirectional contextual dependencies of prosodic boundary labels. SOL further models correlations between prosodic structures, lexicon words as well as part-of-speech (POS), where the prediction of prosodic boundary labels are conditioned upon word tokenization and POS tagging results. Experimental
results demonstrate the effectiveness of the proposed method.
Index Terms: prosodic structure generation, structured output layer (SOL), bidirectional long short-term memory (BLSTM)
All Author(s) ListYuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai
Name of ConferenceAnnual Conference of the International Speech Communication Association (INTERSPEECH 2017)
Start Date of Conference20/08/2017
End Date of Conference24/08/2017
Place of ConferenceStockholm
Country/Region of ConferenceSweden
Proceedings TitleProceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017)
PublisherInternational Speech Communication Association ( ISCA )
Pages779 - 783
LanguagesEnglish-United States

Last updated on 2020-13-07 at 01:14