Abstract
Musical instrument classification is a key task in music information retrieval, supporting applications such as automatic transcription and music recommendation. Research on this topic for traditional Persian music has been limited, largely due to the lack of complete and consistent datasets. This study introduces a reproducible baseline framework that combines supervised contrastive learning with stacked slice‑level aggregation—a late‑fusion approach that integrates predictions from short one‑second segments—to classify 15 traditional Persian instruments, including Ney, Setar, Tar, Santur, and Kamancheh. To support this work, we present the Persian Classical Instrument Dataset, a curated collection of publicly available solo‑instrument recordings encompassing 15 classical Persian instruments. Experiments across three settings—the five‑instrument subset, the full 15‑instrument dataset, and an existing baseline dataset—show that the proposed method achieves up to 99% accuracy on smaller subsets and 98% accuracy on the full dataset using 30‑second inputs. Furthermore, the same framework demonstrates strong generalization on the Dastgah detection task, outperforming previous methods and indicating that timbre‑based representations can transfer effectively to higher‑level modal recognition. Overall, this work provides both a publicly available dataset and a transparent, general‑purpose baseline to advance research on Persian and other non‑Western musical traditions.
