Have a personal or library account? Click to login
Towards an ‘Everything Corpus’: A Framework and Guidelines for the Curation of More Comprehensive Multimodal Music Data Cover

Towards an ‘Everything Corpus’: A Framework and Guidelines for the Curation of More Comprehensive Multimodal Music Data

Open Access
|May 2025

Abstract

Music information retrieval (MIR) is increasingly concerned with properly managing the complexity of musical data and the curation of high-quality multimodal datasets for use in a variety of computational tasks. This article presents (1) a conceptual framework for how practitioners interested in MIR—from musicians to scientists—can understand the multitude of modalities that constitute musical data and (2) a set of proposed guidelines for MIR researchers to consider when setting out to curate comprehensive, well-targeted, durable, and ethically sourced multimodal datasets. For (1), we identify 12 different themes of musical data divided into three, sequential phases further subdivided into five, narrow focus areas: (i) ‘before’ the music (leading to), (ii) the ‘actual’ music (itself and around it), and (iii) ‘after’ the music (uses of and responses to). For (2), we identify 17 specific quantitative, qualitative, and ethical criteria, informed by this conceptual framework and practices observed in existing multimodal datasets, for the eventual construction of an ‘Everything Corpus' for MIR research.

DOI: https://doi.org/10.5334/tismir.228 | Journal eISSN: 2514-3298
Language: English
Submitted on: Sep 30, 2024
Accepted on: Feb 25, 2025
Published on: May 5, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Mark Gotham, Brian Bemman, Igor Vatolkin, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.