An Adaptive-Learning-Based Generative Adversarial Network for One-to-One Voice Conversion
Article Type
Research Article
Publication Title
IEEE Transactions on Artificial Intelligence
Abstract
Voice conversion (VC) emerged as a significant domain of research in the field of speech synthesis in recent years due to its emerging application in voice-assistive technologies, such as automated movie dubbing speech-to-singing conversion, to name a few. VC deals with the conversion of the vocal style of one speaker to another speaker while keeping the linguistic contents unchanged. Nowadays, generative adversarial network (GAN) models are widely used for speech feature mapping from the source speaker to the target speaker. In this article, we propose an adaptive-learning-based GAN model, called ALGAN-VC, to improve the one-to-one VC of speakers. Our ALGAN-VC framework consists of some approaches to improve the speech quality and voice similarity between the source and target speakers. We incorporate a dense residual network architecture into the generator network for efficient speech feature learning between source and target speakers. Our framework also includes an adaptive learning mechanism to compute the loss function for the proposed model. Moreover, a boosted learning rate approach is incorporated to enhance the learning capability of the proposed model. The proposed model is tested on Voice Conversion Challenge 2016, 2018, and 2020 datasets along with our self-prepared Indian regional-language-based speech dataset. In addition, an emotional speech dataset is also considered for evaluating the model's performance. The objective and subjective evaluations of the generated speech samples indicated that the proposed model elegantly performed the voice conversion task by achieving high speaker similarity and good speech quality.
First Page
92
Last Page
106
DOI
https://10.1109/TAI.2022.3149858
Publication Date
2-1-2023
Recommended Citation
Dhar, Sandipan; Jana, Nanda Dulal; and Das, Swagatam, "An Adaptive-Learning-Based Generative Adversarial Network for One-to-One Voice Conversion" (2023). Journal Articles. 3857.
https://digitalcommons.isical.ac.in/journal-articles/3857
Comments
Open Access, Green