The official demos of paper:

Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music


Table of Contents

Abstract

Link to our paper

Link to our code

Demos - The best performance we get with larger training set

Model:

MDN_K=4 (0.27 million parameters)

Unet-5_K=4 (13.3 million parameters)

Demos - The experiments in our paper

Genre:

Pop, Rap, Rock, Electronic, Folk, Others


Abstract

This paper presents a new input format, channel-wise subband input (CWS), for convolutional neural network (CNN) based music source separation (MSS) models in the frequency domain. We aim to address the major issues in CNN-based high-resolution MSS model: high computational cost and weight sharing between distinctly different bands. Specifically in this paper, we decompose the input mixture spectra into several bands and concatenate them channel-wise as the model input. The proposed approach enables effective weight sharing in each subband and introduces more flexibility between channels. For comparison purposes, we perform voice and accompaniment separation (VAS) on models with different scales, architectures, and CWS settings. The result shows that the CWS input is beneficial in many aspects. Among all our experiments, it enables models to obtain a 6.9% performance gain on average. With even a smaller number of parameters, much smaller training data, and shorter training time, our MDenseNet with 8-bands CWS input still surpasses the original MMDenseNet with a large margin. CWS also reduces computational cost and training time to a large extent, which can considerably expedite the experiment process.

For more details, please refer to our paper

Code

We open-source our code on github!



Demos - The best performance we get with full dataset

We trained our model on additional data (Compared with MUSDB only) with 35.18 hours of pure vocal and 279.87 hours of pure music. Each of these experiments take approximately five days on a single GTX 1080Ti GPU. Finally we reach the following result.

Due to the copyright issues, we can not present full length of each mixture. But we offer links for each song. Check the "" button behind each song to get access to the mixture.

Song Accompaniments Vocal
小幸运
学猫叫
Song Accompaniments Vocal
小幸运
学猫叫

Demos - The experiments in this paper

Click on the music you'd like to listen and you will see a full list of experiment results.

Pop

Note that I'm not professional in music. I just categorise these songs by instinct.

Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8

Rap

Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8

Rock

Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8

Electronic

Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8

Folk

Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8

Others

Facebook-Demucs also separate the following three songs for a demo. Their result are presented on this page

Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8
Model Accompaniments Vocal
UNET-5
UNET-5_K=2
UNET-5_K=4
UNET-5_K=8
MMDN
MDN
MDN_K=2
MDN_K=4
MDN_K=8
UNET-6
UNET-6_K=2
UNET-6_K=4
UNET-6_K=8
BD-UNET-6
BD-UNET-6_K=2
BD-UNET-6_K=4
BD-UNET-6_K=8