A Study on Acoustic Modeling for Child Speech Based on Multi-Task Learning
Refereed conference paper presented and published in conference proceedings

Times Cited
Altmetrics Information

Other information
AbstractThis paper describes a study on acoustic modeling of child speech for large-vocabulary speech recognition of Cantonese. This study is driven and enabled by a new speech corpus recently collected for developing acoustic assessment systems for speech sound disorders in Cantonese-speaking children. The speech corpus, named CUChild127, contains 127 Chinese words spoken by 1, 500 pre-school children in Hong Kong. A small amount of manually transcribed child speech is used to initialize a GMM-HMM based speech recognition system, which is subsequently used to generate speech transcriptions for a large amount of training data. Multi-task learning approach is adopted to train a conventional DNN model and a time-delay neural network (TDNN) model. The primary and secondary tasks are context-dependent phone modeling for child speech and adult speech respectively. The training data of adult speech are obtained from an existing phonetically-rich speech corpus. Experimental results show that TDNN based acoustic model significantly outperforms DNN and GMM-HMM systems. Multi-task learning leads to further performance improvement of the TDNN model. The best syllable error rate attained in our experiments is 8.96%, with the weights of the primary and secondary tasks being 0.8 and 0.2.
All Author(s) ListJiarui Wang, Si Ioi Ng, Dehua Tao, Wing Yee Ng, Tan Lee
Name of Conference11th International Symposium on Chinese Spoken Language Processing (ISCSLP)
Start Date of Conference26/11/2018
End Date of Conference29/11/2018
Place of ConferenceTaipei
Country/Region of ConferenceTaiwan
Proceedings TitleProceedings of ISCSLP 2018
Place of PublicationTaipei
Pages389 - 393
LanguagesEnglish-United States
Keywordsacoustic modeling, Cantonese speaking child speech, multi-task learning

Last updated on 2020-28-03 at 02:40