Jointly Trained Conversion Model and WaveNet Vocoder for Non-parallel Voice Conversion using Mel-spectrograms and Phonetic Posteriorgrams
Refereed conference paper presented and published in conference proceedings


Times Cited
Altmetrics Information
.

Other information
AbstractThe N10 system in the Voice Conversion Challenge 2018 (VCC 2018) has achieved high voice conversion (VC) performance in terms of speech naturalness and speaker similarity. We believe that further improvements can be gained from joint optimization (instead of separate optimization) of the conversion model and WaveNet vocoder, as well as leveraging information from the acoustic representation of the speech waveform, e.g. from Mel-spectrograms. In this paper, we propose a VC architecture to jointly train a conversion model that maps phonetic posteriorgrams (PPGs) to Mel-spectrograms and a WaveNet vocoder. The conversion model has a bottle-neck layer, whose outputs are concatenated with PPGs before being fed into the WaveNet vocoder as local conditioning. A weighted sum of a Mel-spectrogram prediction loss and a WaveNet loss is used as the objective function to jointly optimize parameters of the conversion model and the WaveNet vocoder. Objective and subjective evaluation results show that the proposed approach is capable of achieving significantly improved quality in voice conversion in terms of speech naturalness and speaker similarity of the converted speech for both cross-gender and intra-gender conversions.
Acceptance Date17/06/2019
All Author(s) ListSongxiang Liu, Yuewen Cao, Xixin Wu, Lifa Sun, Xunying Liu, Helen Meng
Name of Conference20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019
Start Date of Conference15/09/2019
End Date of Conference19/09/2019
Place of ConferenceGraz
Country/Region of ConferenceAustria
Proceedings TitleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Year2019
Month9
Pages714 - 718
ISSN2308457X
LanguagesEnglish-United States

Last updated on 2020-30-03 at 00:33