Haohe Liu 刘濠赫
Email: haohe.liu AT surrey dot ac dot uk
I’m a final year PhD student at the Centre for Vision Speech and Signal Processing (CVSSP), University of Surrey. I am fortunate to be supervised by Prof. Mark D. Plumbley, co-supervised by Prof. Wenwu Wang. And I’m lucky to be jointly funded by BBC R&D and the Doctoral College. I’m a team member of the EPSRC AI for Sound Project (EP/T019751/1). I’m a UK Global Talent Visa holder, endorsed by the Royal Academy of Engineering.
Research highlights
My research includes tasks related to the audio generative model, source separation, quality enhancement, and recognition, appeared in journals and conferences such as TPAMI, TASLP, JSTSP, ICML, AAAI, NeurIPS, INTERSPEECH, and ICASSP.
I’m the first author of paper such as AudioLDM, AudioLDM 2, NaturalSpeech, VoiceFixer, SemantiCodec, MusicLDM, AudioSR, etc., with around 50 research publications and 2100 citations. Most of my research studies are open-sourced. My open-source projects/checkpoints on GitHub have received over 8800 stars and have been downloaded more than 200000 times.
Highlighted research performed as the first author:
- Text-to-audio generation model: AudioLDM and AudioLDM2.
- Ultra-low bitrate audio codec: Semanticodec
- Audio super-resolution model on any audio type and any sampling rate: AudioSR.
- First text-to-speech model that achieves on par CMOS with human recording: NaturalSpeech.
- Restore the quality of human speech signal regardless of how the signal is degraded: VoiceFixer.
- The music source separation system that achieves leading performance on Music Demixing Challenge 2021: CWS-PResUNet.
- Speech super-resolution model: NVSR.
- A module that makes the temporal-resolution of the spectrogram differentiable for efficient audio classification: DiffRes.
- Few-shot bioacoustic detection: The 2nd ranking system in the DCASE 2022 Challenge Task 5.
Please refer to my Google Scholar Page for the full publication list:
Recent News
- 2025-01-01 📣 FlowSep is accepted by the IEEE International Conference on Acoustics Speech and Signal Processing!
- 2024-12-14 👤 Oral Presentation at the NeurIPS 2024 Audio Imagination Workshop - Topic - AudioSetCaps.
- 2024-12-06 📣 Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing - Separate Anything You Describe - Congrat Xubo Liu et al.!
- 2024-12-04 📣 Accepted by Reviews in Aquaculture - Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture - A Comprehensive Survey - Congrat Meng Cui et al.!
- 2024-11-25 📣 Accepted by IEEE Transactions on Automation Science and Engineering - Multimodal Fish Feeding Intensity Assessment in Aquaculture - Congrat Meng Cui et al.!
- 2024-11-23 🏆 Deeply honored to receive the Post Graduate Researcher of the Year Award for CSEE! [[Linkedin Post]]
- 2024-11-14 👤 Visited NWPU (my undergrad university) and gave a talk.
- 2024-11-09 👤 Presentation at the Shanghai Jiao Tong University X-LANCE lab [[Recording]].
- 2024-10-30 📣 The SemantiCodec is accepted by the IEEE Journal of Selected Topics in Signal Processing.
- 2024-10-17 👤 Visited Telecom Paris and gave a talk - Latent Diffusion Model as a Versatile Coarse-to-Fine Audio Decoder.
- 2024-06-21 📣 One paper got accepted by ACM MM 2024 - FlashSpeech - Efficient Zero-Shot Speech Synthesis.
- 2024-06-19 👤 Talk at Spotify - Introduced SemantiCodec - [Slides].
- 2024-05-16 👤 Talk at MIT CSAIL Spoken Language Systems group. Topic - Learning Audio Pattern with Latent Diffusion Model - [Slides].
- 2024-05-10 👤 Guest Lecture at EEEM068 Applied Machine Learning, University of Surrey, Topic - Introduction to Audio Artificial Intelligence [Slides].
- 2024-04-25 📣 Accepted by the Journal TASLP - IEEE Transactions on Audio Speech and Language Processing - AudioLDM 2
- 2024-04-12 🛩 Attended ICASSP 2024 (Seoul, Korea)! Did an oral presentation there about the paper AudioSR.
- 2024-03-31 👤 Organizing a special session "Generative AI for Media Generation" at 2024 IEEE International Workshop on Machine Learning for Signal Processing (MLSP) [link to IEEE MLSP 2024]
- 2024-03-11 📣 Accepted by ICLR 2024 Workshop LLMAgents - WavCraft - "Audio Editing and Generation with Large Language Models".
- 2024-03-06 👤 Invited talk at UK Acoustic Network - "Recent Progress and Applications of Audio Artificial Intelligence Technologies".
- 2024-02-05 🧗 The IEEE ICME grand challenge we are organizing is officially launched - "Semi-supervised Acoustic Scene Classification under Domain Shift" - [challenge website], [official baseline], [paper]
- 2024-02-05 📣 Accepted by IEEE Open Journal of Signal Processing - "Attention-Based End-to-End Differentiable Particle Filter for Audio Speaker Tracking".
- 2024-01-31 👤 Present at Surrey Open Research Culture Event 2024 [link to YouTube]
- 2024-01-13 📣 The NaturalSpeech paper is accepted by the Journal - IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). ...
- 2023-12-13 📣 Four papers were accepted by ICASSP 2024 this year. Many thanks to the collaborators and the feedback from reviewers!
- 2023-12-12 Short-listed by the Open Research Award of the University of Surrey (only four proposals were shortlisted!).
- 2023-12-09 📣 Accepted by AAAI 2024 - DiffRes (Learning the Spectrogram Temporal Resolution for Audio Classification).
- 2023-12-09 📣 Accepted by NeurIPS 2023 Workshop on Machine Learning for Audio - Composing and Validating Large-Scale Datasets for Training Open Foundation Models for Audio.
- 2023-11-01 👤 Remotely present my research at Meta.
- 2023-10-11 👤 Presented recent research at Huawei Future Device Technology Summit, Helsinki, Finland.
- 2023-09-20 🛩 Attended DCASE 2023 (Tampere, Finland)! Did a spotlight presentation. Receive Judges' award from the DCASE committee.
- 2023-09-15 🌟 Open-sourced AudioSR, a versatile audio super resolution system. The paper is under review.
- 2023-08-10 🌟 Open-sourced AudioLDM 2, an improved version of AudioLDM. The paper is under review.
- 2023-06-10 🛩 Attended ICASSP 2023 (Rhodes, Greece)!
- 2023-06-02 📣 Rank 1st place in DCASE Challenge 2023 Task 7 - Foley Sound Synthesis.
- 2023-05-18 📣 Three papers are accepted by INTERSPEECH 2023!
- 2023-05-11 👤 Visit Department of Engineering, University of Cambridge, UK, for presentation and discussions.
- 2023-04-25 📣 AudioLDM is accepted by ICML, International Conference on Machine Learning.
- 2023-04-14 👤 Remotely present my recent research to Chinese Academy of Science (中科院声学所)
- 2023-04-10 👤 Remotely present my recent research to Gaoling School of Artificial Intelligence, Remin University of China (中国人民大学高瓴人工智能学院)
- 2023-03-08 👤 Gave a remote presentation about AudioLDM to Mila, University of montreal [link to recording]
- 2023-02-28 🌟 Youtube coverage of AudioLDM [link]. Comment area is very interesting. Thanks MattVidPro AI!
- 2023-02-26 🌟 Github repos reach 2000 stars in total!
- 2023-02-24 👤 Gave a remote presentation about AudioLDM to SAMI, ByteDance, China. Thanks Qiuqiang for the invitation!
- 2023-02-23 🌟 AudioLDM ranks Top 25 most liked space (Top 0.01%) on Hugging Face [link].
- 2023-02-22 👤 Gave a presentation about AudioLDM at TikTok, London. Thanks Janne for organizing this event!
- 2023-02-17 👤 Gave a remote presentation about AudioLDM to NetEase, China. Thanks Pengcheng for the invitation!
- 2023-02-15 👤 Live steaming and presenting AudioLDM on WeChat (in Chinese), with 4000+ viewers! [slides]
- 2023-02-15 📣 A paper was accepted in ICASSP 2023.
- 2023-02-13 📣 A Large Chinese Media (机器之心) report our AudioLDM [link]
- 2023-02-03 🌟 AI album [website], generated by our proposed text-to-audio generation model AudioLDM!
- 2022-11-03 🛩 Attend DCASE 2022 Workshop (Nancy, France)!
- 2022-09-18 🛩 Attend INTERSPEECH 2022 (Incheon, Korea) remotely and present two papers!
- 2022-09-16 📣 A paper was accepted in NeurIPS 2022.
- 2022-09-15 📣 A paper was accepted in DCASE Workshop 2022.
- 2022-07-01 📣 Great result in DCASE 2022 Challenge - 2nd in Task 5; 2nd in Task 6B; 3rd in Task 6A.
- 2022-06-01 📣 Four papers were accepted in INTERSPEECH 2022.
- 2022-05-15 📣 A papers was accepted in EUSIPCO 2022.
- 2021-11-12 👤 Presented our winner model CWS-PResUNet to the audience on 2021 ISMIR MDX workshop.
- 2021-11-12 👤 Gave a tutorial talk (slides) on music source separation with Alexandre Defossez and Woosung Choi at the 2021 ISMIR MDX satellite event!
- 2021-09-30 👤 Gave a talk on VENTURE 将门创投 (In Chinese) about the voicefixer I developed recently! [link]
- 2021-08-19 📃 Accept the Ph.D. offer from the CVSSP, University of Surrey, with tuition fee waiver and stipend!
- 2021-07-31 📣 Great result in 2021 MDX Challenge (41 teams and 609 participants in total) - 2nd in vocal separation score; 5th in overall score;
- 2021-07-09 📣 A paper was accepted in ISMIR 2021.
- 2021-06-02 📣 A paper was accepted in INTERSPEECH 2021.
- 2020-07-24 📣 A paper was accepted in INTERSPEECH 2020.
- 2020-07-12 🎓 Graduated from Northwestern Polytechnical University with a bachelor's degree and outstanding graduate award!
- 2020-01-21 📃 Accept a Ph.D. offer from the Ohio State University!
Educations
Centre for Vision, Speech and Signal Processing @ University of Surrey, UK, 01/2022 - 06/2025
– PhD in Vision, Speech and Signal Processing; Main supervisor: Prof. Mark D. Plumbley
– With a studentship from the CVSSP and the EPSRC Grant EP/T019751/1 AI for Sound
– 2024 Postgraduate Researcher of the Year Award - University of Surrey, CSEE
School of Computer Science @ Northwestern Polytechnical University, China, 09/2016 - 07/2020
– Bachelor of Engineering, Outstanding graduate, Computer Science and Technology; Supervisor: Prof. Lei Xie
– GPA: 3.8/4.0 (Top 5%)
Work/Internship Experience
Research Scientist Intern, Meta, Paris, 06/2024 - 10/2024
Consultant, Riffusion, Remote, 10/2023 - 05/2024
Research Intern, Microsoft, Beijing, 10/2021 - 04/2022
Research Intern, ByteDance, Shanghai, 07/2020 - 09/2021
Community Services
Workshop/Challenge Organizations
- Co-organizer of the special session: “Generative AI for Media Generation” at 2024 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), London, UK.
- Co-organizer of the IEEE 2024 ICME Grand Challenge “Semi-supervised Acoustic Scene Classification under Domain Shift”
Talks
- 2024: MIT CSAIL Spoken Language Systems group (16 May), UK Acoustic Network (6 March), Spotify (20 June), Télécom Paris Listen Lab (17 Oct), Shanghai Jiao Tong University (9 Nov)
- 2023: NetEase (17 Feb), TikTok (22 Feb), ByteDance (24 Feb), Mila - Quebec AI Institute (26 Feb), Chinese Academy of Science (14 April), Remin University of China (4 Oct), University of Cambridge (11 May), Huawei Helsinki R&D (11 Oct), Meta FAIR (1 Nov)
Conference Oral Presentations
- 2024 Conference on Neural Information Processing Systems
- 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing
- 2023 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events
- 2021: A Tutorial on Music Source Separation, ISMIR Music Demixing Workshop, Alexandre Défossez, Woosung Choi, Haohe Liu (12 Nov)
Reviewer Services
I serve as a regular reviewer for the following journals:
- IEEE/ACM Transactions on Audio Speech and Language Processing
- IEEE Transactions on Multimedia
- IEEE Transactions on Circuits and Systems for Video Technology
- IEEE Transactions on Neural Networks and Learning Systems
- International Journal on Information Fusion
- Computer Speech & Language
I also serve as a reviewer for the following conferences:
- IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024, 2025
- INTERSPEECH 2023, 2024, 2025
- International Conference on Learning Representations (ICLR) 2025
- Conference on Neural Information Processing Systems (NeurIPS) 2024
- ACM MultiMedia 2023, 2024
- IEEE International Joint Conference on Neural Networks (IJCNN) 2025
- IEEE International Conference on Multimedia & Expo (ICME) 2024, 2025
- IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2024
Competitions
- First place, DCASE Challenge Task 7:Foley Sound Synthesis, 2022
Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Mark D.Plumbley, Wenwu Wang
[Paper] [Demo] [Code] [leaderboard]
- Second place, DCASE Challenge Task 5:Few-shot Bioacoustic Event Detection, 2022
Haohe Liu, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley
[Paper] [Code] [leaderboard]
- Second place on vocal score and fifth place on overall score, ISMIR Music Demixing Challenge, 2021
Haohe Liu and Qiuqiang Kong and Jiafeng Liu
[Paper] [Code] [Challenge details] [leaderboard]
- Second place, DCASE Challenge Task 6B: Language-Based Audio Retrieval, 2022
Xinhao Mei, Xubo Liu, Haohe Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang
[Report] [Challenge details]
- Third place, DCASE Challenge Task 6A: Automated Audio Captioning, 2022
Xinhao Mei, Xubo Liu, Haohe Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang
[Report] [Challenge details]
Honors & Awards
Scholarships
- 2024 Vice-Chancellor’s Award for Teaching and Research Excellence - School Post Graduate Researcher of the Year - University of Surrey
- Judges’ Award - DCASE 2023, Tampere, Finland
- Doctoral College Scholarship, tuition waiver with stipend, University of Surrey
- Outstanding Graduate Students, Northwestern Polytechnical University, 2020-2021
- WuYaJun third class scholarship, 2019
- National Scholarship, Northwestern Polytechnical University (top 1.5%), 2017 and 2018, two consecutive years
- First-class Scholarship, Northwestern Polytechnical University (top 14.7%), 2016 to 2019, three consecutive years
- Gratitude to Scientists of Modern Times National Scholarship (top 0.5%), 2018
Teaching
- Guest Lecturer, EEEM068 Applied Machine Learning, University of Surrey, 2024
- Demonstrator, EEE3008 Fundamentals of DSP, University of Surrey, 2022/23 Semester 1
- Demonstrator, EEE1033 Computer and Digital Logic, University of Surrey, 2022/23 Semester 1
- Demonstrator, EEEM068 Applied Machine Learning, University of Surrey, 2022/23 Semester 2
–>