
KoVox Dataset—A Relational Database of Korean Classical Vocal Performance Ephemera
Abstract
The KoVox Dataset contains structured data on promotional materials for 1,319 Korean classical vocal performances listed on the KOPIS platform between 2016 and 2025. These digital performance ephemera capture artistic intent, program structure, and performer participation, yet they are often non-machine-readable due to their image-based formats. To transform these materials into structured data, we applied a hybrid OCR workflow combining Apple Live Text with ChatGPT-assisted extraction, followed by entity disambiguation using MusicBrainz identifiers. The resulting text was organized into a five-table relational database: performance, work, person, program, participation. Archived on Zenodo as CSV files together with an SQLite database and SQL schema, KoVox functions as a living, extensible archive that supports comparative and longitudinal studies of South Korea’s evolving vocal music performance culture.
© 2026 Minji Kim, Eunsoo Lee, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.