Messages

This page shows the samples in the paper "NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation".

Experiments were based on CMU-ARCTIC dataset and LJ Speech dataset. We also trained a multi-speaker vocoder on LibriTTS dataset. All test samples and test texts have not appeared in the training set and validation set.

Copy-synthesis refers to waveform generation given natural acoustic features. Text-to-speech refers to waveform generation given acoustic features predicted from the text input, and the acoustic model we used is Tacotron2.


1: Comparison between NDPS and Some Existing Vocoders (on CMU-ARCTIC dataset)

Natural WORLD WaveNet WaveGAN NDPS-B NDPS-MBE
Sample 1 (speaker slt)
Sample 2 (speaker slt)
Sample 3 (speaker slt)
Sample 4 (speaker bdl)
Sample 5 (speaker bdl)
Sample 6 (speaker bdl)

2: Comparison between NDPS and Some Existing Vocoders (Copy-synthesis and Taco2+Vocoder on LJ Speech Dataset)

Natural WORLD WaveNet WaveGAN NDPS-B NDPS-MBE
Sample 1 (Copy-synthesis)
Sample 1 (Text-to-speech)
Sample 2 (Copy-synthesis)
Sample 2 (Text-to-speech)
Sample 3 (Copy-synthesis)
Sample 3 (Text-to-speech)

3: Comparison between NDPS and NSF

NDPS-B NSF NDPS-MBE
Sample 1
Sample 2
Sample 3

4: Ablation Test on NDPS-MBE

NDPS-MBE NDPS-MBE-woGAN NDPS-MBE-woD NDPS-MBE-woS NDPS-MBE-woVUV NDPS-MBE-4
Sample 1
Sample 2
Sample 3

5: Noise control (Based on NDPS-MBE)

Natural No Denoise Denoise on Full Band Denoise on Low Band Only
Sample 1
Sample 2
Sample 3

6: Multi-speaker NDPS-MBE vocoder (On libritts and 10 speakers are shown below)

spk1 spk2 spk3 spk4 spk5 spk6 spk7 spk8 spk9 spk10
Natural
NDPS-MBE