Haohe Liu 刘濠赫

📧 Email: haoheliu AT gmail dot com

At the Pont de Bir-Hakeim, Paris (August 2024)

I’m a Research Scientist at Meta FAIR (Seattle, WA, USA). I received my PhD at the Centre for Vision Speech and Signal Processing (CVSSP), University of Surrey, Guildford, UK. During my PhD, I was fortunate to be supervised by Prof. Mark D. Plumbley, co-supervised by Prof. Wenwu Wang. And I was lucky to be jointly funded by BBC R&D and the Doctoral College.

Research highlights

My research includes tasks related to the audio generative model, source separation, quality enhancement, and recognition, appeared in journals and conferences such as TPAMI, TASLP, JSTSP, ICML, AAAI, NeurIPS, INTERSPEECH, and ICASSP.

I’m the first author of paper such as AudioLDM, AudioLDM 2, NaturalSpeech, VoiceFixer, SemantiCodec, MusicLDM, AudioSR, etc., with around 4500 citations. Most of my research studies are open-sourced. My open-source projects/checkpoints on GitHub have received over 10000 stars.

Highlighted research performed as the first author:

Text-to-audio generation model: AudioLDM and AudioLDM2.
Ultra-low bitrate audio codec: Semanticodec
Audio super-resolution model on any audio type and any sampling rate: AudioSR.
First text-to-speech model that achieves on par CMOS with human recording: NaturalSpeech.
Restore the quality of human speech signal regardless of how the signal is degraded: VoiceFixer.
The music source separation system that achieves leading performance on Music Demixing Challenge 2021: CWS-PResUNet.
Speech super-resolution model: NVSR.
A module that makes the temporal-resolution of the spectrogram differentiable for efficient audio classification: DiffRes.
Few-shot bioacoustic detection: The 2nd ranking system in the DCASE 2022 Challenge Task 5.

Please refer to my Google Scholar Page for the full publication list:

Educations

Centre for Vision, Speech and Signal Processing @ University of Surrey, UK, 01/2022 - 06/2025
– PhD in Vision, Speech and Signal Processing; Main supervisor: Prof. Mark D. Plumbley
– With a studentship from the CVSSP and the EPSRC Grant EP/T019751/1 AI for Sound
– 2024 Postgraduate Researcher of the Year Award - University of Surrey, CSEE

School of Computer Science @ Northwestern Polytechnical University, China, 09/2016 - 07/2020
– Bachelor of Engineering, Outstanding graduate, Computer Science and Technology; Supervisor: Prof. Lei Xie
– GPA: 3.8/4.0 (Top 5%)

Community Services

Workshop/Challenge Organizations

Co-organizer of the special session: “Generative AI for Media Generation” at 2024 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), London, UK.
Co-organizer of the IEEE 2024 ICME Grand Challenge “Semi-supervised Acoustic Scene Classification under Domain Shift”

Talks

2024: MIT CSAIL Spoken Language Systems group (16 May), UK Acoustic Network (6 March), Spotify (20 June), Télécom Paris Listen Lab (17 Oct), Shanghai Jiao Tong University (9 Nov)
2023: NetEase (17 Feb), TikTok (22 Feb), ByteDance (24 Feb), Mila - Quebec AI Institute (26 Feb), Chinese Academy of Science (14 April), Remin University of China (4 Oct), University of Cambridge (11 May), Huawei Helsinki R&D (11 Oct), Meta FAIR (1 Nov)

Conference Oral Presentations

2024 Conference on Neural Information Processing Systems
2024 IEEE International Conference on Acoustics, Speech, and Signal Processing
2023 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events
2021: A Tutorial on Music Source Separation, ISMIR Music Demixing Workshop, Alexandre Défossez, Woosung Choi, Haohe Liu (12 Nov)

Reviewer Services

I serve as a regular reviewer for the following journals:

IEEE/ACM Transactions on Audio Speech and Language Processing
IEEE Transactions on Multimedia
IEEE Signal Processing Letters
IEEE Transactions on Circuits and Systems for Video Technology
IEEE Transactions on Neural Networks and Learning Systems
International Journal on Information Fusion

I also serve as a reviewer for the following conferences:

International Conference on Learning Representations 2026
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024, 2025
International Conference on Computer Vision (ICCV) 2025
AAAI Conference on Artificial Intelligence 2026
INTERSPEECH 2023, 2024, 2025
International Conference on Learning Representations (ICLR) 2025
Conference on Neural Information Processing Systems (NeurIPS) 2024
ACM MultiMedia 2023, 2024
IEEE International Joint Conference on Neural Networks (IJCNN) 2025
IEEE International Conference on Multimedia & Expo (ICME) 2024, 2025
IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2024
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025

Competitions

First place, DCASE Challenge Task 7:Foley Sound Synthesis, 2022
Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Mark D.Plumbley, Wenwu Wang
[Paper] [Demo] [Code] [leaderboard]

Second place, DCASE Challenge Task 5:Few-shot Bioacoustic Event Detection, 2022
Haohe Liu, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley
[Paper] [Code] [leaderboard]

Second place on vocal score and fifth place on overall score, ISMIR Music Demixing Challenge, 2021
Haohe Liu and Qiuqiang Kong and Jiafeng Liu
[Paper] [Code] [Challenge details] [leaderboard]

Second place, DCASE Challenge Task 6B: Language-Based Audio Retrieval, 2022
Xinhao Mei, Xubo Liu, Haohe Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang
[Report] [Challenge details]

Third place, DCASE Challenge Task 6A: Automated Audio Captioning, 2022
Xinhao Mei, Xubo Liu, Haohe Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang
[Report] [Challenge details]

Teaching

Guest Lecturer, EEEM068 Applied Machine Learning, University of Surrey, 2024
Demonstrator, EEE3008 Fundamentals of DSP, University of Surrey, 2022/23 Semester 1
Demonstrator, EEE1033 Computer and Digital Logic, University of Surrey, 2022/23 Semester 1
Demonstrator, EEEM068 Applied Machine Learning, University of Surrey, 2022/23 Semester 2

Tweets by LiuHaohe

–>