VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration

in Proceedings of INTERSPEECH 2022, 2022

Authors

Haohe Liu, Xubo Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang

Abstract

Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus on a single type of distortion, such as speech denoising or dereverberation. However, speech signals can be degraded by several different distortions simultaneously in the real world. It is thus important to extend speech restoration models to deal with multiple distortions. In this paper, we introduce VocieFixer, a unified framework for high-fidelity speech restoration. VocieFixer restores speech from multiple distortions (e.g., noise, reverberation, clipping) and can expand degraded speech (e.g., noisy speech) with a low bandwidth to 44.1 kHz full-bandwidth high-fidelity speech. We design VoiceFixer based on (1) an analysis stage that predicts intermediate-level features from the degraded speech, and (2) a synthesis stage that generates waveform using a neural vocoder. Both objective and subjective evaluations show that VocieFixer is effective on severely degraded speech, such as real-world historical speech recordings.

Demo

High-fidelity Speech Restoration (on the HiFi-Res test set)

Speaker infoUnprocessedBaseline_UNetVoiceFixerOracleTarget
33_simulated
Spectrogramfnamefnamefnamefnamefname
127_simulated
Spectrogramfnamefnamefnamefnamefname

Speech Enhancement (on the VCTK-Demand test set)

Speaker infoUnprocessedEnh_UNetBaseline_UNetVoiceFixerOracleTarget
p232_005
Spectrogramfnamefnamefnamefnamefnamefname
p257_008
Spectrogramfnamefnamefnamefnamefnamefname

Speech Dereverberation (DEREV)

Speaker infoUnprocessedDerev_UNetGSR_UNetVF_UNetOracleTarget
p361_001
Spectrogramfnamefnamefnamefnamefnamefname
p363_004
Spectrogramfnamefnamefnamefnamefnamefname

Speech Declipping (DECLI)

Clipping threshold 0.1

Speaker infoUnprocessedDeclip_UNetSSPADEGSR_UNetVF_UNetOracleTarget
p360_001
Spectrogramfnamefnamefnamefnamefnamefnamefname

Clipping threshold 0.25

Speaker infoUnprocessedDeclip_UNetSSPADEBaseline_UNetVF_UNetOracleTarget
p360_001
Spectrogramfnamefnamefnamefnamefnamefnamefname

Comparison on the restoration of real-world recordings.

Speaker infoUnprocessedBaseline_UNetVoiceFixer
A recording of my voice.
Spectrogramfnamefnamefname
TV news interviews.
Spectrogramfnamefnamefname
Chinese Youtuber.
Spectrogramfnamefnamefname

More historical speech restoration demos

Speaker infoBeforeAfterSpeaker infoBeforeAfter
Speech by Bruce Lee (1940-1973)Speech by Amelia Earhart (1897-1937)
SpectrogramfnamefnameSpectrogramfnamefname
Documentary filmSpeech by Hu Shi (1891-1962)
SpectrogramfnamefnameSpectrogramfnamefname

In Takaaki’s self-supervised speech restoration paper, they build a demopage to compare their model with VoiceFixer. Their page is availabe here.