Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer
Refereed conference paper presented and published in conference proceedings


摘要Prosodic structure generation from text plays an important role in Chinese text-to-speech (TTS) synthesis, which greatly influences the naturalness and intelligibility of the synthesized speech. This paper proposes a multi-task learning method for prosodic structure generation using bidirectional long shortterm memory (BLSTM) recurrent neural network (RNN) and structured output layer (SOL). Unlike traditional methods where prerequisites such as lexicon word or even syntactic tree are usually required as the input, the proposed method predicts prosodic boundary labels directly from Chinese characters. BLSTM RNN is used to capture the bidirectional contextual dependencies of prosodic boundary labels. SOL further models correlations between prosodic structures, lexicon words as well as part-of-speech (POS), where the prediction of prosodic boundary labels are conditioned upon word tokenization and POS tagging results. Experimental
results demonstrate the effectiveness of the proposed method.
Index Terms: prosodic structure generation, structured output layer (SOL), bidirectional long short-term memory (BLSTM)
著者Yuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai
會議名稱Annual Conference of the International Speech Communication Association (INTERSPEECH 2017)
會議論文集題名Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017)
出版社International Speech Communication Association ( ISCA )
頁次779 - 783

上次更新時間 2020-06-08 於 03:17