Exploiting Cross-Domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition
Refereed conference paper presented and published in conference proceedings
Officially Accepted for Publication


Full Text

Other information
AbstractArticulatory features (AFs) are inherently invariant to acoustic signal distortion. Their practical application to atypical domains such as elderly, disordered speech across languages is limited by data scarcity. This paper presents a cross-domain and cross-lingual Acoustic-to-Articulatory (A2A) inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model training before being adapted to three datasets: the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech corpora; and the English TORGO dysarthric speech data, to produce UTI based AFs. Experiments suggest incorporating the generated AFs consistently outperforms the baseline TDNN/Conformer ASR systems using acoustic features only by statistically significant word/character error rate reductions up to 4.75%, 2.59% and 2.07% absolute (14.69%, 10.64% and 22.72% relative) after data augmentation, speaker adaptation and cross system multi-pass decoding.
Acceptance Date18/05/2023
All Author(s) ListShujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Helen Meng, Xunying Liu
Name of ConferenceISCA Interspeech2023
Start Date of Conference20/08/2023
End Date of Conference24/08/2023
Place of ConferenceDublin
Country/Region of ConferenceIreland
Year2023
LanguagesEnglish-United States

Last updated on 2023-01-06 at 11:47