CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment
Publication in refereed journal


Times Cited
Altmetrics Information
.

Other information
AbstractThis paper describes the design and development of CUCHILD, a large-scale Cantonese corpus of child speech. The corpus contains spoken words collected from 1,986 child speakers aged from 3 to 6 years old. The speech materials include 130 words of 1 to 4 syllables in length. The speakers cover both typically developing (TD) children and children with speech disorder. The intended use of the corpus is to support scientific and clinical research, as well as technology development related to child speech assessment. The design of the corpus, including selection of words, participants recruitment, data acquisition process, and data pre-processing are described in detail. The results of acoustical analysis are presented to illustrate the properties of child speech. Potential applications of the corpus in automatic speech recognition, phonological error detection and speaker diarization are also discussed.
Acceptance Date07/08/2020
All Author(s) ListSi-Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee, Kathy Yuet-Sheung Lee, Michael Chi-Fai Tong
Country/Region of ConferenceChina
Journal nameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Year2020
Month10
Day25
Volume Number2020
Pages424 - 428
ISSN1990-9772
LanguagesEnglish-United States
Keywordsspeech corpus, child speech, Cantonese, speech, sound disorder

Last updated on 2021-22-06 at 00:25