Haohe Liu 刘濠赫 (Leo)
Email: haohe.liu AT surrey dot ac dot uk
I’m Haohe Liu, a final year PhD student at the Centre for Vision Speech and Signal Processing (CVSSP), University of Surrey. I’m the first author of paper such as AudioLDM, AudioLDM 2, NaturalSpeech, VoiceFixer, MusicLDM, AudioSR, etc., with 50+ research publications and over 1800 citations. My open-source projects/checkpoints on GitHub have received over 8300 stars and have been downloaded more than 150000 times.
My research includes topics related to speech, music, and general audio. I am fortunate to be advised by Prof. Mark D. Plumbley, co-supervised by Prof. Wenwu Wang. And I’m lucky to be jointly funded by BBC R&D and the Doctoral College. I’m a team member of the EPSRC AI for Sound Project (EP/T019751/1). Most of my studies are open-sourced.
Research highlights
My research includes tasks related to the audio generative model, source separation, quality enhancement, and recognition, appeared in journals and conferences such as TPAMI, TASLP, ICML, AAAI, NeurIPS, INTERSPEECH, and ICASSP.
Highlighted research performed as the first author:
- Text-to-audio generation model: AudioLDM and AudioLDM2.
- Ultra-low bitrate audio codec: Semanticodec
- Audio super-resolution model on any audio type and any sampling rate: AudioSR.
- First text-to-speech model that achieves on par CMOS with human recording: NaturalSpeech.
- Restore the quality of human speech signal regardless of how the signal is degraded: VoiceFixer.
- The music source separation system that achieves leading performance on Music Demixing Challenge 2021: CWS-PResUNet.
- Speech super-resolution model: NVSR.
- A module that makes the temporal-resolution of the spectrogram differentiable for efficient audio classification: DiffRes.
- Few-shot bioacoustic detection: The 2nd ranking system in the DCASE 2022 Challenge Task 5.
Please refer to my Google Scholar Page for the full publication list:
Recent News
- 2024-10-30 📣 The SemantiCodec is accepted by the IEEE Journal of Selected Topics in Signal Processing
- 2024-10-17 👤 Visited Telecom Paris and gave a talk - Latent Diffusion Model as a Versatile Coarse-to-Fine Audio Decoder
- 2024-06-21 📣 One paper got accepted by ACM MM 2024 - FlashSpeech - Efficient Zero-Shot Speech Synthesis
- 2024-06-19 👤 Talk at Spotify - Introduced SemantiCodec - [Slides].
- 2024-05-16 👤 Talk at MIT CSAIL Spoken Language Systems group. Topic - Learning Audio Pattern with Latent Diffusion Model - [Slides].
- 2024-05-10 👤 Guest Lecture at EEEM068 Applied Machine Learning, University of Surrey, Topic - Introduction to Audio Artificial Intelligence [Slides].
- 2024-04-25 📣 Accepted by the Journal TASLP - IEEE Transactions on Audio Speech and Language Processing - AudioLDM 2
- 2024-04-12 🛩 Attended ICASSP 2024 (Seoul, Korea)! Did an oral presentation there about the paper AudioSR.
- 2024-03-31 👤 Organizing a special session "Generative AI for Media Generation" at 2024 IEEE International Workshop on Machine Learning for Signal Processing (MLSP) [link to IEEE MLSP 2024]
- 2024-03-11 📣 Accepted by ICLR 2024 Workshop LLMAgents - WavCraft - "Audio Editing and Generation with Large Language Models".
- 2024-03-06 👤 Invited talk at UK Acoustic Network - "Recent Progress and Applications of Audio Artificial Intelligence Technologies".
- 2024-02-05 🧗 The IEEE ICME grand challenge we are organizing is officially launched - "Semi-supervised Acoustic Scene Classification under Domain Shift" - [challenge website], [official baseline], [paper]
- 2024-02-05 📣 Accepted by IEEE Open Journal of Signal Processing - "Attention-Based End-to-End Differentiable Particle Filter for Audio Speaker Tracking".
- 2024-01-31 👤 Present at Surrey Open Research Culture Event 2024 [link to YouTube]
- 2024-01-13 📣 The NaturalSpeech paper is accepted by the Journal - IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
- 2023-12-13 📣 Four papers were accepted by ICASSP 2024 this year. Many thanks to the collaborators and the feedback from reviewers!
- 2023-12-12 Short-listed by the Open Research Award of the University of Surrey (only four proposals were shortlisted!).
- 2023-12-09 📣 Accepted by AAAI 2024 - DiffRes (Learning the Spectrogram Temporal Resolution for Audio Classification).
- 2023-12-09 📣 Accepted by NeurIPS 2023 Workshop on Machine Learning for Audio - Composing and Validating Large-Scale Datasets for Training Open Foundation Models for Audio. ...
- 2023-11-01 👤 Remotely present my research at Meta.
- 2023-10-11 👤 Presented recent research at Huawei Future Device Technology Summit, Helsinki, Finland.
- 2023-09-20 🛩 Attended DCASE 2023 (Tampere, Finland)! Did a spotlight presentation. Receive Judges' award from the DCASE committee.
- 2023-09-15 🌟 Open-sourced AudioSR, a versatile audio super resolution system. The paper is under review.
- 2023-08-10 🌟 Open-sourced AudioLDM 2, an improved version of AudioLDM. The paper is under review.
- 2023-06-10 🛩 Attended ICASSP 2023 (Rhodes, Greece)!
- 2023-06-02 📣 Rank 1st place in DCASE Challenge 2023 Task 7 - Foley Sound Synthesis.
- 2023-05-18 📣 Three papers are accepted by INTERSPEECH 2023!
- 2023-05-11 👤 Visit Department of Engineering, University of Cambridge, UK, for presentation and discussions.
- 2023-04-25 📣 AudioLDM is accepted by ICML, International Conference on Machine Learning.
- 2023-04-14 👤 Remotely present my recent research to Chinese Academy of Science (中科院声学所)
- 2023-04-10 👤 Remotely present my recent research to Gaoling School of Artificial Intelligence, Remin University of China (中国人民大学高瓴人工智能学院)
- 2023-03-08 👤 Gave a remote presentation about AudioLDM to Mila, University of montreal [link to recording]
- 2023-02-28 🌟 Youtube coverage of AudioLDM [link]. Comment area is very interesting. Thanks MattVidPro AI!
- 2023-02-26 🌟 Github repos reach 2000 stars in total!
- 2023-02-24 👤 Gave a remote presentation about AudioLDM to SAMI, ByteDance, China. Thanks Qiuqiang for the invitation!
- 2023-02-23 🌟 AudioLDM ranks Top 25 most liked space (Top 0.01%) on Hugging Face [link].
- 2023-02-22 👤 Gave a presentation about AudioLDM at TikTok, London. Thanks Janne for organizing this event!
- 2023-02-17 👤 Gave a remote presentation about AudioLDM to NetEase, China. Thanks Pengcheng for the invitation!
- 2023-02-15 👤 Live steaming and presenting AudioLDM on WeChat (in Chinese), with 4000+ viewers! [slides]
- 2023-02-15 📣 A paper was accepted in ICASSP 2023.
- 2023-02-13 📣 A Large Chinese Media (机器之心) report our AudioLDM [link]
- 2023-02-03 🌟 AI album [website], generated by our proposed text-to-audio generation model AudioLDM!
- 2022-11-03 🛩 Attend DCASE 2022 Workshop (Nancy, France)!
- 2022-09-18 🛩 Attend INTERSPEECH 2022 (Incheon, Korea) remotely and present two papers!
- 2022-09-16 📣 A paper was accepted in NeurIPS 2022.
- 2022-09-15 📣 A paper was accepted in DCASE Workshop 2022.
- 2022-07-01 📣 Great result in DCASE 2022 Challenge - 2nd in Task 5; 2nd in Task 6B; 3rd in Task 6A.
- 2022-06-01 📣 Four papers were accepted in INTERSPEECH 2022.
- 2022-05-15 📣 A papers was accepted in EUSIPCO 2022.
- 2021-11-12 👤 Presented our winner model CWS-PResUNet to the audience on 2021 ISMIR MDX workshop.
- 2021-11-12 👤 Gave a tutorial talk (slides) on music source separation with Alexandre Defossez and Woosung Choi at the 2021 ISMIR MDX satellite event!
- 2021-09-30 👤 Gave a talk on VENTURE 将门创投 (In Chinese) about the voicefixer I developed recently! [link]
- 2021-08-19 📃 Accept the Ph.D. offer from the CVSSP, University of Surrey, with tuition fee waiver and stipend!
- 2021-07-31 📣 Great result in 2021 MDX Challenge (41 teams and 609 participants in total) - 2nd in vocal separation score; 5th in overall score;
- 2021-07-09 📣 A paper was accepted in ISMIR 2021.
- 2021-06-02 📣 A paper was accepted in INTERSPEECH 2021.
- 2020-07-24 📣 A paper was accepted in INTERSPEECH 2020.
- 2020-07-12 🎓 Graduated from Northwestern Polytechnical University with a bachelor's degree and outstanding graduate award!
- 2020-01-21 📃 Accept a Ph.D. offer from the Ohio State University!
Education Experience
Centre for Vision, Speech and Signal Processing @ University of Surrey, UK, 01/2022 - 01/2025
– PhD in Vision, Speech and Signal Processing; Main advisor: Prof. Mark D. Plumbley
– With a studentship from the CVSSP and the EPSRC Grant EP/T019751/1 AI for Sound
School of Computer Science @ Northwestern Polytechnical University, China, 09/2016 - 07/2020
– Bachelor of Engineering, Outstanding graduate, Computer Science and Technology; Advisor: Prof. Lei Xie
– GPA: 3.8/4.0 (Top 5%)
Competitions
- First place, DCASE Challenge Task 7:Foley Sound Synthesis, 2022
Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Mark D.Plumbley, Wenwu Wang
[Paper] [Demo] [Code] [leaderboard]
- Second place, DCASE Challenge Task 5:Few-shot Bioacoustic Event Detection, 2022
Haohe Liu, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley
[Paper] [Code] [leaderboard]
- Second place on vocal score and fifth place on overall score, ISMIR Music Demixing Challenge, 2021
Haohe Liu and Qiuqiang Kong and Jiafeng Liu
[Paper] [Code] [Challenge details] [leaderboard]
- Second place, DCASE Challenge Task 6B: Language-Based Audio Retrieval, 2022
Xinhao Mei, Xubo Liu, Haohe Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang
[Report] [Challenge details]
- Third place, DCASE Challenge Task 6A: Automated Audio Captioning, 2022
Xinhao Mei, Xubo Liu, Haohe Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang
[Report] [Challenge details]
Honors & Awards
Scholarships
- Judges’ Award - DCASE 2023, Tampere, Finland
- Doctoral College Scholarship, tuition waiver with stipend, University of Surrey
- Outstanding Graduate Students, Northwestern Polytechnical University, 2020-2021
- WuYaJun third class scholarship, 2019
- National Scholarship, Northwestern Polytechnical University (top 1.5%), 2017 and 2018, two consecutive years
- First-class Scholarship, Northwestern Polytechnical University (top 14.7%), 2016 to 2019, three consecutive years
- Gratitude to Scientists of Modern Times National Scholarship (top 0.5%), 2018
Teaching
- Guest Lecturer, EEEM068 Applied Machine Learning, University of Surrey, 2024
- Demonstrator, EEE3008 Fundamentals of DSP, University of Surrey, 2022/23 Semester 1
- Demonstrator, EEE1033 Computer and Digital Logic, University of Surrey, 2022/23 Semester 1
- Demonstrator, EEEM068 Applied Machine Learning, University of Surrey, 2022/23 Semester 2