VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration

in Proceedings of INTERSPEECH 2022, 2022

Authors

Haohe Liu, Xubo Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang

Abstract

Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus on a single type of distortion, such as speech denoising or dereverberation. However, speech signals can be degraded by several different distortions simultaneously in the real world. It is thus important to extend speech restoration models to deal with multiple distortions. In this paper, we introduce VocieFixer, a unified framework for high-fidelity speech restoration. VocieFixer restores speech from multiple distortions (e.g., noise, reverberation, clipping) and can expand degraded speech (e.g., noisy speech) with a low bandwidth to 44.1 kHz full-bandwidth high-fidelity speech. We design VoiceFixer based on (1) an analysis stage that predicts intermediate-level features from the degraded speech, and (2) a synthesis stage that generates waveform using a neural vocoder. Both objective and subjective evaluations show that VocieFixer is effective on severely degraded speech, such as real-world historical speech recordings.

Demo

High-fidelity Speech Restoration (on the HiFi-Res test set)

Speaker info	Unprocessed	Baseline_UNet	VoiceFixer	Oracle	Target
33_simulated
Spectrogram
127_simulated
Spectrogram

Speech Enhancement (on the VCTK-Demand test set)

Speaker info	Unprocessed	Enh_UNet	Baseline_UNet	VoiceFixer	Oracle	Target
p232_005
Spectrogram
p257_008
Spectrogram

Speech Dereverberation (DEREV)

Speaker info	Unprocessed	Derev_UNet	GSR_UNet	VF_UNet	Oracle	Target
p361_001
Spectrogram
p363_004
Spectrogram

Speech Declipping (DECLI)

Clipping threshold 0.1

Speaker info	Unprocessed	Declip_UNet	SSPADE	GSR_UNet	VF_UNet	Oracle	Target
p360_001
Spectrogram

Clipping threshold 0.25

Speaker info	Unprocessed	Declip_UNet	SSPADE	Baseline_UNet	VF_UNet	Oracle	Target
p360_001
Spectrogram

Comparison on the restoration of real-world recordings.

Speaker info	Unprocessed	Baseline_UNet	VoiceFixer
A recording of my voice.
Spectrogram
TV news interviews.
Spectrogram
Chinese Youtuber.
Spectrogram

More historical speech restoration demos

Speaker info	Before	After	Speaker info	Before	After
Speech by Bruce Lee (1940-1973)			Speech by Amelia Earhart (1897-1937)
Spectrogram			Spectrogram
Documentary film			Speech by Hu Shi (1891-1962)
Spectrogram			Spectrogram

In Takaaki’s self-supervised speech restoration paper, they build a demopage to compare their model with VoiceFixer. Their page is availabe here.

Share on

Twitter Facebook LinkedIn