Exploration of phase and vocal excitation modulation features for speaker recognition
Refereed conference paper presented and published in conference proceedings


摘要Mel-frequency cepstral coefficients (MFCCs) are found closely related to the linguistic content of speech. Besides cepstral features, there are resources in speech, e.g, the phase and excitation source, are believed to contain useful properties for speaker discrimination. Moreover, the magnitude-based features are insufficient to provide satisfactory and robust speaker recognition accuracy in real-world applications when large variations exist between the development and application scenarios. AM-FM signal modeling technique offers an effective approach to characterize and analyze speech properties. This work is therefore motivated to capture the relevant phase and vocal excitation related modulation features in complementing with MFCCs. In the context of multi-band demodulation analysis, we present a novel parameterization of speech and vocal excitation signal. A pertinent representation for most dominant primary frequencies present in the speech signal is first built. It is then applied to frames of the speech signal to derive effective speaker-discriminative features. The source-related amplitude and phase quantities are also parameterized into feature vectors. The application of the features is assessed in the context of a standard speaker identification and verification system. Complementary correlation between MFCCs and the modulation features is revealed by system fusion on score level. © 2012 Springer-Verlag.
著者Wang N., Ching P.C., Lee T.
會議名稱7th Chinese Conference on Biometric Recognition, CCBR 2012
詳細描述ed. by Wei-Shi Zheng, Zhenan Sun, Yunhong Wang, Xilin Chen, Pong C. Yuen and Jianhuang Lai.
卷號7701 LNCS
出版社Springer Verlag
頁次251 - 259
關鍵詞excitation modulation features, phase information, Speaker recognition

上次更新時間 2020-14-10 於 02:09