Have a personal or library account? Click to login

References

  1. 1Avendano, C. (2003). Frequency-domain source identification and manipulation in stereo mixes for enhancement, suppression and re-panning applications. In 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No. 03TH8684), pages 5558. IEEE. DOI: 10.1109/ASPAA.2003.1285818
  2. 2Brossier, P., Tintamar, Muller, E., Philippsen, N., Seaver, T., Fritz, H., cyclopsian, Alexander, S., Williams, J., Cowgill, J., and Cruz, A. (2019). aubio/aubio: 0.4.9. DOI: 10.5281/zenodo.2578765
  3. 3Chen, Z., Luo, Y., and Mesgarani, N. (2017). Deep attractor network for single-microphone speaker separation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 246250. DOI: 10.1109/ICASSP.2017.7952155
  4. 4Defferrard, M., Benzi, K., Vandergheynst, P., and Bresson, X. (2016). FMA: A dataset for music analysis. arXiv preprint arXiv:1612.01840.
  5. 5Dubey, H., Gopal, V., Cutler, R., Aazami, A., Matusevych, S., Braun, S., Eskimez, S. E., Thakker, M., Yoshioka, T., Gamper, H., et al. (2022). ICASSP 2022 Deep Noise Suppression Challenge. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 92719275. DOI: 10.1109/ICASSP43922.2022.9747230
  6. 6Fabbro, G., Uhlich, S., Lai, C.-H., Choi, W., Martinez-Ramirez, M., Liao, W., Gadelha, I., Ramos, G., Hsu, E., Rodrigues, H., Stoeter, F.-R., Defossez, A., Luo, Y., Yu, J., Chakraborty, D., Mohanty, S., Solovyev, R., Stempkovskiy, A., Habruseva, T., Goswami, N., Harada, T., Kim, M., Lee, J. H., Dong, Y., Zhang, X., Liu, J., and Mitsufuji, Y. (2024). The Sound Demixing Challenge 2023 – Music Demixing Track. Transactions of the International Society for Music Information Retrieval, 7(1): 6384. DOI: 10.5334/tismir.171
  7. 7Fonseca, E., Favory, X., Pons, J., Font, F., and Serra, X. (2021). FSD50K: An open dataset of humanlabeled sound events. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30: 829852. DOI: 10.1109/TASLP.2021.3133208
  8. 8Geiger, J. T., Grosche, P., and Parodi, Y. L. (2015). Dialogue enhancement of stereo sound. In 23rd European Signal Processing Conference (EUSIPCO), pages 869873. DOI: 10.1109/EUSIPCO.2015.7362507
  9. 9Grais, E. M., Sen, M. U., and Erdogan, H. (2014). Deep neural networks for single channel source separation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 37343738. DOI: 10.1109/ICASSP.2014.6854299
  10. 10Hershey, J. R., Chen, Z., Le Roux, J., and Watanabe, S. (2016). Deep clustering: Discriminative embeddings for segmentation and separation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3135. DOI: 10.1109/ICASSP.2016.7471631
  11. 11Huang, P.-S., Chen, S. D., Smaragdis, P., and Hasegawa-Johnson, M. (2012). Singing-voice separation from monaural recordings using robust principal component analysis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5760. DOI: 10.1109/ICASSP.2012.6287816
  12. 12International Telecommunications Union (2015). ITUR BS.1770-4: Algorithms to measure audio programme loudness and true-peak audio level. https://www.itu.int/rec/R-REC-BS.1770.
  13. 13Kim, M., Choi, W., Chung, J., Lee, D., and Jung, S. (2021). Kuielab-mdx-net: A two-stream neural network for music demixing. arXiv preprint arXiv:2111.12203.
  14. 14Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  15. 15Le Roux, J., Wisdom, S., Erdogan, H., and Hershey, J. R. (2019). SDR – Half-baked or well done? In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 626630. DOI: 10.1109/ICASSP.2019.8683855
  16. 16Luo, Y. and Yu, J. (2023). Music source separation with band-split RNN. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:18931901. DOI: 10.1109/TASLP.2023.3271145
  17. 17Martinez-Ramirez, M. A., Liao, W.-H., Fabbro, G., Uhlich, S., Nagashima, C., and Mitsufuji, Y. (2022). Automatic music mixing with deep learning and out-of-domain data. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR).
  18. 18Masri, P. (1996). Computer Modelling of Sound for Transformation and Synthesis of Musical Signals. PhD thesis, University of Bristol.
  19. 19Mitsufuji, Y., Fabbro, G., Uhlich, S., Stoter, F.-R., Defossez, A., Kim, M., Choi, W., Yu, C.-Y., and Cheuk, K.- W. (2022). Music Demixing Challenge 2021. Frontiers in Signal Processing, 1:18. DOI: 10.3389/frsip.2021.808395
  20. 20Panayotov, V., Chen, G., Povey, D., and Khudanpur, S. (2015). Librispeech: An ASR corpus based on public domain audio books. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 52065210. DOI: 10.1109/ICASSP.2015.7178964
  21. 21Paulus, J., Torcoli, M., Uhle, C., Herre, J., Disch, S., and Fuchs, H. (2019). Source separation for enabling dialogue enhancement in object-based broadcast with MPEG-H. Journal of the Audio Engineering Society, 67(7/8):510521. DOI: 10.17743/jaes.2019.0032
  22. 22Petermann, D., Wichern, G., Subramanian, A. S., Wang, Z.-Q., and Le Roux, J. (2023). Tackling the cocktail fork problem for separation and transcription of real-world soundtracks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31. DOI: 10.1109/TASLP.2023.3290428
  23. 23Petermann, D., Wichern, G., Wang, Z.-Q., and Le Roux, J. (2022). The cocktail fork problem: Three-stem audio separation for real-world soundtracks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 526530. DOI: 10.1109/ICASSP43922.2022.9746005
  24. 24Rafii, Z., Liutkus, A., Stoter, F.-R., Mimilakis, S. I., and Bittner, R. (2019). MUSDB18-HQ – an uncompressed version of MUSDB18. DOI: 10.5281/zenodo.3338373
  25. 25Rouard, S., Massa, F., and Défossez, A. (2023). Hybrid transformers for music source separation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). DOI: 10.1109/ICASSP49357.2023.10096956
  26. 26Sawata, R., Takahashi, N., Uhlich, S., Takahashi, S., and Mitsufuji, Y. (2023). The whole is greater than the sum of its parts: Improving DNNbased music source separation. arXiv preprint arXiv:2305.07855.
  27. 27Sawata, R., Uhlich, S., Takahashi, S., and Mitsufuji, Y. (2021). All for one and one for all: Improving music separation by bridging networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5155. DOI: 10.1109/ICASSP39728.2021.9414044
  28. 28Solovyev, R., Stempkovskiy, A., and Habruseva, T. (2023). Benchmarks and leaderboards for sound demixing tasks. arXiv preprint arXiv:2305.07489.
  29. 29Sound Effects Wiki (2024). Godzilla roar. https://soundeffects.fandom.com/wiki/Godzilla_Roar [Accessed: 2024-01-15].
  30. 30Steinmetz, C. J. and Reiss, J. (2021). pyloudnorm: A simple yet flexible loudness meter in python. In Audio Engineering Society Convention 150. Audio Engineering Society.
  31. 31Stöter, F.-R., Liutkus, A., and Ito, N. (2018). The 2018 Signal Separation Evaluation Campaign. In Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), pages 293305. Springer. DOI: 10.1007/978-3-319-93764-9_28
  32. 32Stöter, F.-R., Uhlich, S., Liutkus, A., and Mitsufuji, Y. (2019). Open-unmix – A reference implementation for music source separation. Journal of Open Source Software, 4(41):1667. DOI: 10.21105/joss.01667
  33. 33Torcoli, M., Simon, C., Paulus, J., Straninger, D., Riedel, A., Koch, V., Wits, S., Rieger, D., Fuchs, H., Uhle, C., et al. (2021). Dialog+ in broadcasting: First field tests using deep-learning-based dialogue enhancement. arXiv preprint arXiv:2112.09494.
  34. 34Tzanetakis, G., Jones, R., and McNally, K. (2007). Stereo panning features for classifying recording production style. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), pages 441444.
  35. 35Uhle, C., Hellmuth, O., and Weigel, J. (2008). Speech enhancement of movie sound. In Audio Engineering Society Convention 125. Audio Engineering Society.
  36. 36Uhlich, S., Giron, F., and Mitsufuji, Y. (2015). Deep neural network based instrument extraction from music. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 21352139. DOI: 10.1109/ICASSP.2015.7178348
  37. 37Uhlich, S., Porcu, M., Giron, F., Enenkl, M., Kemp, T., Takahashi, N., and Mitsufuji, Y. (2017). Improving music source separation based on deep neural networks through data augmentation and network blending. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 261265. DOI: 10.1109/ICASSP.2017.7952158
  38. 38Vincent, E., Sawada, H., Bofill, P., Makino, S., and Rosca, J. P. (2007). First stereo audio source separation evaluation campaign: Data, algorithms and results. In Proceedings of the International Conference on Independent Component Analysis and Signal Separation, pages 552559. Springer. DOI: 10.1007/978-3-540-74494-8_69
  39. 39Watcharasupat, K. N., Wu, C.-W., Ding, Y., Orife, I., Hipple, A. J., Williams, P. A., Kramer, S., Lerch, A., and Wolcott, W. (2023). A generalized bandsplit neural network for cinematic audio source separation. IEEE Open Journal of Signal Processing. DOI: 10.1109/OJSP.2023.3339428
  40. 40Wisdom, S., Hershey, J. R., Wilson, K., Thorpe, J., Chinen, M., Patton, B., and Saurous, R. A. (2019). Differentiable consistency constraints for improved deep speech enhancement. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 900904. DOI: 10.1109/ICASSP.2019.8682783
  41. 41Yu, D., Kolbak, M., Tan, Z.-H., and Jensen, J. (2017). Permutation invariant training of deep models for speaker-independent multi-talker speech separation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 241245. DOI: 10.1109/ICASSP.2017.7952154
  42. 42Yu, J., Chen, H., Luo, Y., Gu, R., Li, W., and Weng, C. (2023). TSpeech-AI system description to the 5th Deep Noise Suppression (DNS) Challenge. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). DOI: 10.1109/ICASSP49357.2023.10097210
  43. 43Yu, J., Luo, Y., Chen, H., Gu, R., and Weng, C. (2022). High fidelity speech enhancement with band-split RNN. arXiv preprint arXiv:2212.00406. DOI: 10.21437/Interspeech.2023-1433
DOI: https://doi.org/10.5334/tismir.172 | Journal eISSN: 2514-3298
Language: English
Submitted on: Aug 22, 2023
Accepted on: Feb 13, 2024
Published on: Apr 17, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.