Conference Articles

Voice Conversion Using Feature Specific Loss Function Based Self-Attentive Generative Adversarial Network

Sandipan Dhar, National Institute of Technology, Durgapur
Padmanabha Banerjee, Jalpaiguri Government Engineering College
Nanda Dulal Jana, National Institute of Technology, Durgapur
Swagatam Das, Indian Statistical Institute, Kolkata

Document Type

Conference Article

Publication Title

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Abstract

Voice conversion (VC) is the process of converting the vocal texture of a source speaker similar to that of a target speaker without altering the content of the source speaker's speech. With the ongoing developments of deep generative models, generative adversarial networks (GANs) appeared as a better alternative to the conventional statistical models for VC. However, the existing VC model-generated speech samples possess substantial dissimilarity from their corresponding natural human speech. Therefore, in this work a GAN-based VC model is proposed which is incorporated with a self-attention (SA) mechanism based generator network to obtain the formant distribution of the target mel-spectrogram efficiently. Moreover, the modulation spectra distance (MSD) is also incorporated in this work as a feature-specific loss in terms of getting high speaker similarity. The proposed model has been tested with CMU Arctic and VCC 2018 datasets. Based on the objective and subjective evaluations, we observe the proposed feature-specific loss-based self-attentive GAN (FLSGAN-VC) model significantly performed better than the state-of-the-art (SOTA) MelGAN-VC model.

DOI

10.1109/ICASSP49357.2023.10095069

Publication Date

1-1-2023

Recommended Citation

Dhar, Sandipan; Banerjee, Padmanabha; Jana, Nanda Dulal; and Das, Swagatam, "Voice Conversion Using Feature Specific Loss Function Based Self-Attentive Generative Adversarial Network" (2023). Conference Articles. 552.
https://digitalcommons.isical.ac.in/conf-articles/552

This document is currently not available here.

COinS

Conference Articles

Voice Conversion Using Feature Specific Loss Function Based Self-Attentive Generative Adversarial Network

Document Type

Publication Title

Abstract

DOI

Publication Date

Recommended Citation

Browse

Search

Author Corner

Links

Conference Articles

Voice Conversion Using Feature Specific Loss Function Based Self-Attentive Generative Adversarial Network

Authors

Document Type

Publication Title

Abstract

DOI

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner

Links