NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation

Messages

This page shows the samples in the paper "NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation".

Experiments were based on CMU-ARCTIC dataset and LJ Speech dataset. We also trained a multi-speaker vocoder on LibriTTS dataset. All test samples and test texts have not appeared in the training set and validation set.

Copy-synthesis refers to waveform generation given natural acoustic features. Text-to-speech refers to waveform generation given acoustic features predicted from the text input, and the acoustic model we used is Tacotron2.

1: Comparison between NDPS and Some Existing Vocoders (on CMU-ARCTIC dataset)

	Natural	WORLD	WaveNet	WaveGAN	NDPS-B	NDPS-MBE
Sample 1 (speaker slt)
Sample 2 (speaker slt)
Sample 3 (speaker slt)
Sample 4 (speaker bdl)
Sample 5 (speaker bdl)
Sample 6 (speaker bdl)

2: Comparison between NDPS and Some Existing Vocoders (Copy-synthesis and Taco2+Vocoder on LJ Speech Dataset)

	Natural	WORLD	WaveNet	WaveGAN	NDPS-B	NDPS-MBE
Sample 1 (Copy-synthesis)
Sample 1 (Text-to-speech)
Sample 2 (Copy-synthesis)
Sample 2 (Text-to-speech)
Sample 3 (Copy-synthesis)
Sample 3 (Text-to-speech)