Have a personal or library account? Click to login
MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion Cover

MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion

Open Access
|Jul 2025

Abstract

Music question–answering (MQA) is a machine learning task where a computational system analyzes and answers questions about music‑related data. Traditional methods prioritize audio, overlooking visual and embodied aspects crucial to music performance understanding. We introduce MusiQAl, a multimodal dataset of 310 music performance videos and 11,793 human‑annotated question–answer pairs, spanning diverse musical traditions and styles. Grounded in musicology and music psychology, MusiQAl emphasizes multimodal reasoning, causal inference, and cross‑cultural understanding of performer–music interaction. We benchmark AVST and LAVISH architectures on MusiQAI, revealing strengths and limitations, underscoring the importance of integrating multimodal learning and domain expertise to advance MQA and music information retrieval.

DOI: https://doi.org/10.5334/tismir.222 | Journal eISSN: 2514-3298
Language: English
Submitted on: Sep 1, 2024
Accepted on: May 30, 2025
Published on: Jul 31, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Anna-Maria Christodoulou, Kyrre Glette, Olivier Lartillot, Alexander Refsum Jensenius, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.