Conference Articles

FID-RPRGAN-VC: Fréchet Inception Distance Loss based Region-wise Position Normalized Relativistic GAN for Non-Parallel Voice Conversion

Sandipan Dhar, National Institute of Technology, Durgapur
Tousin Akhter, National Institute of Technology, Durgapur
Padmanabha Banerjee, Jalpaiguri Government Engineering College
Nanda Dulal Jana, National Institute of Technology, Durgapur
Swagatam Das, Indian Statistical Institute, Kolkata

Document Type

Conference Article

Publication Title

2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

Abstract

Voice conversion (VC) is the speech-to-speech (STS) synthesis process that converts the vocal identity of a source speaker to a target speaker by keeping the linguistic content unaltered. In recent years, VC research has been explored using generative adversarial network (GAN) models. However, a substantial difference exists between the real and the state-of-the-art (SOTA) VC model-generated speech samples as far as naturalness is concerned. This work proposes an improved GAN model for non-parallel VC to enhance the naturalness of the generated speech samples. The improved GAN model is integrated with a region-wise positional normalization technique in the generator, a relativistic mechanism-based discriminator, and a Fréchet inception distance (FID) based loss function. We tested the proposed model on VCC 2018, CMU Arctic, and a dysarthric speech dataset. The experimental results revealed the superiority of the proposed FID-RPRGAN-VC model over the SOTA MaskCycleGAN-VC model.

First Page

350

Last Page

356

DOI

10.1109/APSIPAASC58517.2023.10317438

Publication Date

1-1-2023

Recommended Citation

Dhar, Sandipan; Akhter, Tousin; Banerjee, Padmanabha; Jana, Nanda Dulal; and Das, Swagatam, "FID-RPRGAN-VC: Fréchet Inception Distance Loss based Region-wise Position Normalized Relativistic GAN for Non-Parallel Voice Conversion" (2023). Conference Articles. 542.
https://digitalcommons.isical.ac.in/conf-articles/542

This document is currently not available here.

COinS

Conference Articles

FID-RPRGAN-VC: Fréchet Inception Distance Loss based Region-wise Position Normalized Relativistic GAN for Non-Parallel Voice Conversion

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Browse

Search

Author Corner

Links

Conference Articles

FID-RPRGAN-VC: Fréchet Inception Distance Loss based Region-wise Position Normalized Relativistic GAN for Non-Parallel Voice Conversion

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner

Links