1 Introduction
“The language of our nation differs from [that of] China, and matches not with their writing system ….” So wrote King Sejong when he promulgated Hangul in 1446, introducing a unique Korean character system of his own invention. This declaration emphasizes the critical importance of developing appropriate notation systems tailored to each language’s distinct characteristics—a principle that can be extended to musical notation.
Despite the many recent successes of symbolic music generation (Huang et al., 2017; Wang et al., 2024) in the field of music information retrieval (MIR), most such systems (and the encoding schemes used) focus on Western music and use domain‑specific assumptions such as a hard‑coded basis in “quarter note” measurements of time and binary divisions thereof (with the 16th note as an atomic unit). These encoding schemes for symbolic music reflect aspects of Western notation and can be suboptimal for encoding musical works from different cultural and notational traditions. Combined with a paucity of machine‑readable datasets, this makes symbolic music research on non‑Western music challenging.
In this article, we introduce an MIR project for reviving 15th‑century Korean court music melodies into music for an ensemble of multiple instruments, as shown in Figure 1, which was commissioned by the National Gugak1 Center (NGC). For achieving training models with limited data and generating music of concert‑level quality, we developed a comprehensive framework that addresses a wide range of MIR tasks: (1) constructing a symbolic dataset of jeong‑ak (정악, 正樂, Korean court music)2 acquired through bespoke optical musical recognition (OMR) for reading Jeongganbo scores, (2) designing a domain‑specific encoding scheme for Jeongganbo notation, and (3) developing two types of transformer‑based generative models (a Bidirectional Encoder Representations from Transformers [BERT]‑like masked language model [MLM] and an encoder–decoder model) trained on these data.

Figure 1
Overview of the research framework.
Here, we describe not only the technical contribution but also full details from the initial commission to the public performance by NGC. We consider this an interesting case study in the practical application of MIR technologies to cultural heritage preservation and reinterpretation. In the final chapter, we discuss the broader implications of artificial intelligence (AI)‑generated music within traditional music communities.
2 On the Commissioning of this Project
2.1 Historical context and initial impetus
The impetus for this research stems from a 2023 initiative by the NGC, Korea’s primary institution for traditional music, focused on the “restoration” of early Joseon Dynasty (15th century) music. At the heart of this initiative lie three historically significant compositions: Yeo‑min‑lak(여민락, 與民樂), Chi‑hwa‑pyeong(치화평, 致和平), and Chwi‑pung‑hyeong(취풍형, 醉豊亨). These pieces share a profound historical connection, all composed by, or under the direction of, Sejong the Great to accompany lyrics from Yongbieocheonga(용비어천가, 龍飛御天歌) – the first work ever written in the Korean alphabet, Hangul, created and promulgated by King Sejong in 1446. The original scores for these compositions are preserved in the “Veritable Records of Sejong” (Sejong Sillok), representing some of the oldest surviving musical notations in Korea (Choi, 2021). However, Yeominlak alone has been transmitted through various modifications across the centuries and is still performed in the present day, while the other two pieces are preserved only in 15th‑century scores.
Given their shared genesis—composed simultaneously by the same individuals and set to the same lyrical texts (Lim, 2012)—the NGC hypothesized that the contemporary performance traditions of the extant Yeominlak might offer valuable stylistic and structural insights for imagining how Chihwapyeong and Chwipunghyeong might have evolved had their performance lineages continued uninterrupted. Consequently, a central objective of this commissioned project was to emulate discernible characteristics of the modern Yeominlak performance tradition in creating new, performable versions of the two lost pieces.
2.2 Evolving objectives
Initially, one of the NGC’s primary motivations for employing AI was the prospect of achieving a form of “objective reconstruction.” The hope was that data‑driven AI models could generate modern interpretations of these ancient scores relatively free from the subjective biases and interpretations of individual human researchers. The idea was to leverage data to bridge the historical gap and realize the old notations in a contemporary performance style without specific individual musical judgments overly influencing the outcome.
However, the historical and musicological realities of early Joseon music presented immediate complexities. The surviving notational sources, while invaluable, are often sparse and open to multiple interpretations regarding rhythm, ornamentation, and even precise pitch in some contexts. This inherent ambiguity means that, even within academic circles, diverse theories and interpretations coexist (Chun, 2017; Condit, 1977; Lee, 1987). There has also been a wide‑ranging debate over the evolution of 15th‑century Yeominlak from Sejong Sillok to the present day through various remaining intermediate sources (Moon, 2007; Song, 2007). Recognizing these challenges, the NGC convened two advisory committee meetings during the project’s planning phase. These discussions led to a crucial understanding: a purely “objective” reconstruction was neither feasible nor necessarily desirable. Instead, a series of informed musicological choices was required to establish a clear framework for the AI’s generative task. The resulting foundational principles, established in consultation with the NGC, were as follows:
Rhythmic interpretation
The eight jeonggan (정간, a notational unit) of the 15th‑century Sejong Sillok scores would be interpreted as equivalent to one jangdan (장단, rhythmic cycle or measure) in contemporary performance. This aligns with the metrical organization found in Lee Hye‑gu’s influential research on the historical transformation of Yeominlak (Lee, 2004).
Formal and stylistic model
The generated pieces would emulate the formal structure and instrumentation of contemporary Yeominlak. This specifically entailed:
A six‑instrument court ensemble (Figure 1) comprising:
– Gayageum and Geomungo (plucked string),
– Haegeum and Ajaeng (bowed string), and
– Daegeum and Piri (wood wind).
An instrumental, nonvocal texture.
The use of 20‑beat and 10‑beat jangdan cycles.
The pitch hwangjong (黃鐘) serving as the tonal center (tonic), tuned to E, with the hyangpiri (a type of piri) as the lead melodic instrument.
These foundational decisions and resources provided clear starting points and targets for our generative models.
3 Related works
3.1 Computational approaches to non‑western music
While most research in MIR has focused on Western music, there has been significantly increased interest in non‑Western music since the 2010s. In particular, the CompMusic project (Serra, 2017) played a key role in consolidating research on several non‑Western art music traditions, including Hindustani and Carnatic music, Turkish makam, Arab‑Andalusian nawba, and Chinese jingju. By integrating corpus creation, signal analysis, and cultural knowledge, it provided a common reference point for subsequent MIR studies that aimed to approach music through culturally specific frameworks.
Following the CompMusic project, MIR corpora for other traditional music traditions have continued to expand, with projects on Iranian (Nikzat and Repetto, 2022), Greek (Papaioannou et al., 2022), and Chinese (Zhou et al., 2025) repertoires, among others. Methodologically, while earlier works in computational ethnomusicology mainly relied on signal processing or rule‑based algorithms (Ganguli et al., 2016; Repetto et al., 2015), deep learning approaches have also been applied to various MIR tasks on traditional music (Clayton et al., 2022; Plaja‑Roglans et al., 2023). A study by Han et al. (2023) has also applied deep learning to musicological questions, investigating modes of Korean folk songs. This methodological shift, combined with increasingly large and diverse corpora, has enabled new research directions in cross‑cultural MIR, such as investigating the transferability of models trained with different musical cultures (Papaioannou et al., 2023) or building a single foundation model for diverse traditional music (Kanatas et al., 2025; Papaioannou et al., 2025).
3.2 Generative AI emulating historical styles
Recent advances in neural network‑based music generation have resulted in a wide range of artistic output. Among these, there have been a number of projects explicitly designed to realize a brief that is delimited by historical style. One advantage of such projects is that, while evaluation is a perennial challenge for generative AI, it is much more focused in these cases than for “free” composition to no specific brief (Lerch et al., 2025).
We briefly note some important, recent examples. Since 2020, the AI Music Generation Challenge (Sturm, 2023) has been held annually, focusing on generating songs in the style of Irish and Swedish folk music. This event has allowed for exploration of new methods for generation and evaluation of traditional music through the means of deep learning models. We see some similarities between that task and ours given certain aspects of the musical material, such as an emphasis on heterophonic variants of melodic lines.
The Beethoven X project (Gotham et al., 2022) used neural networks to learn aspects of Beethoven’s compositional style and create a realization of his plans for 10th Symphony. This work resembles ours in various ways, spanning technical (e.g., sparse starting material relative to the target), cultural (approaching revered historical objects), and contextual (culmination in live, high‑profile performance) aspects.
Finally, attempts at automatic generation have been made for traditional music from beyond the West. In the symbolic domain, research has explored Persian music (Ebrahimi et al., 2019) and Chinese music (Luo et al., 2020), while recent work in audio generation has fine‑tuned models on Indian art music and Turkish makam music (Mehta et al., 2025). For Korean traditional music, approaches using Gated Recurrent Unit (GRU)‑based methods (Park et al., 2023) or topological data analysis (Tran et al., 2024) have been explored, though these works are limited to modeling monophonic melody from a single piece and do not provide qualitative or quantitative evaluation of the generated outcomes. However, the limited progress in such areas often stems from the reduced attention they have received relative to the distinctiveness of their musical systems, which naturally demand bespoke methods based on deep cultural understanding. This growing but still nascent interest in applying generative AI to non‑Western traditions highlights both the potential and the challenges in developing culturally informed approaches to music generation across diverse musical heritages.
4 Preliminary Approach
4.1 Yeominlak chronological score
Following the discussions described in Section 2.2, the NGC provided a measure‑aligned chronological dataset of Yeominlak based on the reconstructions published by early Korean musicologist Lee Hye‑gu, whose analytical framework has been regarded as a foundational reference in gugak studies (Lee, 1987). The dataset was delivered in MusicXML format and includes eight versions converted into Western staff notation.
This measure‑aligned chronological Yeominlak contains only the first five of seven “chapters” (i.e., broadly equivalent to “movements” in Western classical music), and only the geomungo part (which can be regarded as the lowest bass line), without the other instruments.
Figure 2 shows the first page of the chronological Yeominlak. Each staff represents the Yeominlak of a specific era, sorted in chronological order. The early Yeominlak recorded in the Sejong sillok musical notation exhibits extremely sparse notation, with individual measures frequently consisting of only one or two notes or a single note extending across multiple measures. As time goes on, from 1454 CE to the present, the number of notes in corresponding measures of the same melody increases significantly. The melodic structure expands from 8 beats to 10 beats, and new notes are inserted between existing ones, demonstrating a clear tendency toward greater note density.

Figure 2
Measure‑level alignment between eight different versions of Yeominlak, in chronological order (the earliest is at the top), provided by NGC.
4.2 Model design and results
In our preliminary experiment, we trained an era‑transformation model in the Seq2Seq architecture (Sutskever et al., 2014) with simple attention that takes a four‑measure melody from a specific era and generates a corresponding four‑measure melody of the next era, using the chronological Yeominlak dataset. Subsequently (Jeong, 2023), we used an encoding scheme that includes pitch, duration, beat offset, beat strength, and measure index.
Using this era‑transformation model, we converted the music of Chihwapyeong and Chwipunghyeong, which is a monophonic melody converted from 15th‑century Jeongganbo, to an era‑transformed version. We repeatedly fed the output melody of the model back to the model to transform the melody into the style of the next era in a cascading manner. After seven steps of repetitive transformation, the 15th‑century melody evolved into the geomungo part of contemporary court music. Based on this geomungo part, we generated a corresponding accompaniment track for each instrument in the ensemble. This orchestral part generation model is also based on Seq2Seq, trained with a contemporary Yeominlak score in Western staff notation. Examples of final generations are presented in Figures 3 and 4, which were submitted as the official final outcome of the commissioned project.

Figure 3
An example output generated by a model trained using the Yeominlak chronological score (Chwipunghyeong).

Figure 4
An example orchestration output generated by a model trained using the contemporary Yeominlak score (Chwipunghyeong, second measure).
4.3 Critical limitations
There were clear limitations of the approach outlined above. Most notably, as we only used Yeominlak for the training set, applying this to Chihwapyeong and Chwipunghyeong naturally yielded results too similar to Yeominlak. Furthermore, the form and characteristics of the original 15th‑century melody tended to gradually fade away through each era’s transformation to the point of nonexistence in the final version. Unlike Yeominlak of the 15th century, Chihwapyeong and Chwipunghyeong have a more clearly defined melody, as these latter used the Korean–Chinese mixed style of Yongbieocheonga, while Yeominlak used the Chinese literary version of it as the lyric. This heavily affected the musical styles, as the Korean–Chinese mixed style has more syllables per line.
These limitations led the research team to broaden the research scope. First, we expanded the training set to the entire jeong‑ak. Second, we employed an MLM to preserve the original melody while enabling stylistic adaptation. Finally, we developed a Jeongganbo‑native model that eliminates the need for converting between Jeongganbo and Western staff notation. Although such conversion is algorithmically feasible, we questioned its necessity: why not model Jeongganbo notation directly? This also resonates with the philosophy of King Sejong, who explained that the reason for inventing Korea’s own alphabet was that the Korean language differs from the Chinese language. In the same spirit, by adopting an encoding system that is tailored to jeong‑ak, Jeongganbo can be more suitable for modeling its style.
5 Jeongganbo Dataset
5.1 Jeongganbo notation
Much of Korean court music is written in Jeongganbo, a traditional musical notation system. Jeongganbo is recognized as the first system in East Asia capable of simultaneously representing both the pitch and duration of notes (Kim, 2010). This versatility has been instrumental in passing down court music throughout history (Koehler et al., 2015).
Jeongganbo uses grid‑divided boxes (jeonggan) as the basic unit of time. The number of characters (notes) and their position within each jeonggan varies to denote rhythm. Figure 5 provides an example passage, and Figure 6 provides a schematic overview of possible positions.

Figure 5
An example of Jeongganbo in the original notion (below) and a broadly equivalent conversion to Western classical notion (above). Dashed lines are not part of either notation and are added simply to clarify the temporal alignment between the two systems.

Figure 6
Jeonggan‑like encoding position labels.
Here, we provide a broad introduction to this rhythmic notation system in quasi‑Western musical theoretic language. Each jeonggan is broadly equivalent to a beat. If a jeonggan features only one character, this note event starts at the beginning of the beat and lasts the beat’s full duration. The first box (“0”) in Figure 6 is in this form, as is the second jeonggan of Figure 5, where the “compound beats” correspond to the duration ♩. (in this case for the note B♭4). If the following jeonggan is empty, the previously played note is sustained (as in the fourth jeonggan of Figure 5). Within each jeonggan, the next metrical level is denoted by the number of “row”‑like divisions of the box, and, after that, the “column”‑style divisions of each “row” (sic, individually). The number of “rows” relates broadly to the top‑level division of the beat. For example, in Figure 6, the box with 10–11 has two vertically stacked elements, referring to a two‑part division of the span, while the box with 1–3 refers to three equal divisions (here, 3× ♪s). The number of “columns” further subdivides this top level. In Figure 6, the numbers 4–9 feature first a three‑part division of the ♩. beat into 3× ♪s (positions 4, 6, and 8), and then a 2× division of those ♪s (e.g., 4→5). It is possible for each of these rows to be divided or not separately (as in the eighth jeonggan of Figure 5).
Playing techniques and ornamentations called sigimsae (시김새) are sometimes notated for each instrument. When sigimsae are placed to the right of notes, they serve as ornamentations or embellishments for the corresponding note; when written on their own, they indicate timed instructions to play a specific note or musical phrase. For convenience and alignment with the Western notation, the example score of Figure 5 is notated horizontally; in Korean practice, the score page is read from top to bottom and right to left. A gak (각, column of connected jeonggan) can consist of anything from 4 to 20 jeonggan, with each gak representing a phrase unit.
5.2 Optical music recognition
We have constructed a dataset of 91 pieces by applying OMR to all compositions available within the manuscripts published by the NGC. The manuscripts cover the entire repertoire of the remaining jeong‑ak. OMR was necessary since the scores are only provided as PDF images, the semantic data were unavailable, and the corpus size made manual transcription too time‑intensive. We implemented and trained an encoder–decoder transformer with a convolutional neural network by synthesizing various jeonggan images using a rule‑based approach. The model achieved an exact symbol match rate of approximately 89% per jeonggan, with individual accuracy rates of around 95%, respectively, for pitch, position, and ornament symbols on the test set. Given that most errors occurred in rarely appearing complex jeonggan patterns, we estimate the overall accuracy across complete scores to be higher and sufficient for practical use. The details of the OMR model can be found in the work of Kim et al. (2025).
The 91 pieces in the constructed dataset cover around 80% of the NGC’s total manuscripts in terms of pages. We excluded pieces that are not notated in Jeongganbo and those that are for solo or duet.
Among the 91 pieces, the instruments daegeum, piri, and haegeum appear consistently across all pieces. The geomungo occasionally posed certain challenges during OMR processing due to its ossia‑like notation, leading us to exclude it from those pieces; consequently, the geomungo dataset contains 72 pieces—four fewer than the gayageum set, which totals 76. The ajaeng, featured least frequently, appears in only 33 pieces.
5.3 Corpus analysis
5.3.1 Instrumental configurations
Our dataset naturally falls into three distinct instrumental configurations, as detailed in Table 1, each corresponding to particular musical contexts. The first category consists of 18 pieces featuring the complete six‑instrument ensemble, typically performed by large groups of musicians at court banquets or ceremonial occasions. The second category, which forms the largest group (54 pieces), involves five instruments and primarily includes pungnyu‑bang (salon‑style) chamber music, among which instrumental accompaniment for gagok (classical vocal repertoire) is particularly prevalent. These pieces usually involve smaller ensembles, with one musician per instrument; notably, they omit the ajaeng and use sepiri, a softer variant for the piri part. The final category comprises 19 four‑instrument pieces arranged exclusively for sustaining melodic instruments—wind and bowed strings—with no plucked strings involved. These pieces historically served either ceremonial purposes or as accompaniment for court dances, depending on the size of the ensemble.
Table 1
Ensemble configurations of the Jeongganbo dataset.
| Ens. Size | Pieces | Instrumentation |
|---|---|---|
| 6 | 18 | Daegeum, Piri, Haegeum, Ajaeng, Gayageum, Geomungo |
| 5 | 54 | Daegeum, Piri, Haegeum, Gayageum, Geomungo |
| 4 | 19 | Daegeum, Piri, Haegeum, Ajaeng |
Our target pieces for generation—Chihwapyeong and Chwipunghyeong—closely align with the musical characteristics of the full six‑instrument configuration used in royal court contexts. Nevertheless, due to significant overlap in melodic function and instrumental roles across the categories, we decided to include all available data when training the model to improve its overall musical fluency and generalization capability.
5.3.2 Rhythmic structure (jangdan)
Korean court music is generally structured around recurring rhythmic patterns known as jangdan. While analogous to the Western concept of meter, jangdan cycles are characterized by a specific tempo, fixed length, and an inherent stress pattern. Our dataset contains a variety of jangdan lengths. As shown in Table 2a, the most frequent jangdan spans 12 beats (569 occurrences), followed by 16‑beat (543) and 6‑beat (429) cycles. Although a single jangdan typically governs an entire piece, some scores feature 16‑beat jangdan cycles notated as subdivided into 14 and 2 beats; such subdivided instances total 101 cases. Meanwhile, the 20‑beat and 10‑beat cycles—our targets for automatic generation—appear 219 and 336 times, respectively. We considered this quantity sufficient for the model to learn their structural properties.
Table 2
Cycle length distribution and token variety across instruments.
(a) Total number of jangdan (daegeum)
| Cycle Length | Pieces by Ensemble | Total No. of Jangdan | ||
|---|---|---|---|---|
| 4‑Ens. | 5‑Ens. | 6‑Ens. | ||
| 20 | 3 | 2 | 2 | 219 |
| 18 | 2 | — | — | 30 |
| 16 | 4 | 31 | 5 | 543 |
| 12 | 5 | 8 | 3 | 569 |
| 10 | 2 | 7 | 5 | 336 |
| 8 | 1 | 1 | — | 71 |
| 6 | 2 | 4 | 3 | 429 |
| 4 | — | 1 | — | 67 |
| Other | — | — | — | 101 |
(b) Unique and total token types per instrument
| Instrument | Unique Tokens | Total Tokens | ||
|---|---|---|---|---|
| Pitch | Sigimsae | Pitch | Sigimsae | |
| Daegeum | 21 | 44 | 28516 | 19864 |
| Piri | 21 | 40 | 25869 | 15452 |
| Haegeum | 18 | 31 | 24811 | 9106 |
| Gayageum | 19 | 2 | 20722 | 39 |
| Geomungo | 19 | 19 | 18837 | 4082 |
| Ajaeng | 15 | 7 | 8352 | 623 |
5.3.3 Pitch and sigimsae
Table 2b summarizes the number of unique pitch tokens and sigimsae (ornamental figures) types observed for each instrument. The daegeum and piri exhibit the richest expressive vocabularies, each with 21 unique pitches and over 40 sigimsae types. The haegeum also demonstrates considerable ornamental variety with 31 sigimsae types. In contrast, the ajaeng and gayageum are more restricted in their sigimsae expression, with only seven and two types, respectively. However, the particularly low count of sigimsae for the gayageum stems from a limitation in our OMR process. Sigimsae for the gayageum are notated outside the jeonggan grid in the version of NGC’s score book; consequently, they were missed by our OMR system, which was configured to recognize only symbols within the jeonggan boxes. These inter‑instrumental disparities can be interpreted as reflecting both the idiomatic playing techniques of each instrument and long‑standing Jeongganbo notational conventions.
5.4 Conversion to western staff notation
To convert Jeongganbo to Western staff notation for the wide adaptation of our dataset and research, we implemented a rule‑based system that calculates the duration of each symbol in Jeongganbo. For the reference pitch, Hwang (黃) was set to E♭, consistent with its usage in the project. A single jangdan cycle was mapped to one measure in Western staff notation, and the duration of one jeonggan was set to 3x ♪, which in turn determined the time signature.
Some sigimsae denote specific, relative pitch (e.g., the symbol “
” instructs to play one note higher than the given pitch). In this case, we calculated the corresponding pitch by estimating the scale of the piece by selecting the five most frequent pitches. In terms of duration, sigimsae fall into two categories: those with specified duration and those without. Those with duration were converted to regular notes with full‑size noteheads; those without duration were converted into grace notes (examples include the symbol “
” that denotes a short grace note played one step higher than the main note and immediately preceding it).
Building upon this automated conversion system, we release both the original publicly available Jeongganbo scores and their corresponding conversions into Western staff notation. Our link also provides conversion from monophonic melody in staff notation to our jeonggan‑like encoding.
6 Encoding Schemes for Jeongganbo
In the field of symbolic music generation for Western monophonic and polyphonic music, encoding schemes such as ABC notation, which denotes pitch and duration separately, are effective and prevalent (Sturm et al., 2015). However, when it comes to Korean court music, whose heterophonic structure is a defining characteristic, it is crucial that the intricate alignment of different melodies be well‑represented in encoding. The genre also exhibits prolonged notes and considerable variations in note lengths, which proves to be a challenge for learning algorithms, especially when data are limited.
These distinct musical qualities call for a specialized encoding scheme; for this, we propose jeonggan‑like encoding (hereafter “JG‑like encoding”), which closely follows the positional notation of Jeongganbo. This symbolic music‑encoding method is modeled to inherently reflect the composition and notation style of traditional Korean court music.
The detailed rules of encoding are as follows. The boundary of a jeonggan is designated as a bar (“|”) token. A change of measure (gak) is indicated by a line break (“\n”). As illustrated in Figure 6, the position of each note is denoted by a number between 0 and 15, after which the pitch symbol follows.
As discussed above, sigimsae can either have a duration or not. Sigimsae with duration, such as the “
” symbol in Figure 7, are handled in the same way as pitch symbols. Sigimsae without duration such as “
”, which appear at the side of the pitch character, are placed after the corresponding pitch symbol.

Figure 7
Comparison between Jeonggan (JG)‑like, REMI‑like, and ABC‑like encoding schemes.
There are several advantages that we can expect to gain from using JG‑like encoding. First, with position‑based encoding, the duration‑related vocabulary is limited to just 16 entries. In contrast, duration‑based encoding schemes require learning each duration token as a separate entry, resulting in a significantly larger vocabulary. Comparing the two cases, the JG‑like encoding can be much more advantageous for learning rhythmic patterns when training with a limited dataset. The position‑based approach inherently prevents rhythmic errors where the total duration of notes exceeds or falls short of the measure length. Additionally, rather than determining the length of a note with a single calculation, JG‑like encoding allows for the flexible adjustment of note lengths during inference via the combination of jeonggan boundary and position tokens. This enables generation of music that is more adaptable to the time step and takes into account the sequence of the input source. This can be expected to result in more dynamic and context‑aware music generation. Table 3 shows the average number of tokens, pitches, and sigimsae per piece for each instrument when parsed with the JG‑like encoding.
Table 3
Average token counts per instrument per piece with jeonggan‑like encoding (18 pieces with 6 ens).
| Instrument | Total | Pitch | Sigimsae |
|---|---|---|---|
| Daegeum | 1876.9 | 470.6 | 343.9 |
| Piri | 1658.1 | 428.7 | 258.3 |
| Haegeum | 1388.9 | 395.0 | 118.1 |
| Gayageum | 1086.1 | 346.3 | 0.9 |
| Geomungo | 1119.2 | 326.8 | 66.7 |
| Ajaeng | 1042.3 | 306.5 | 22.5 |
6.1 Other possible encodings
REMI (revamped Musical Instrument Digital Interface [MIDI]‑derived events) (Huang and Yang, 2020) first proposed the usage of a beat‑position feature rather than time‑shifting to encode temporal position. We also experiment with REMI‑like encoding, which adopts three token types: beat position, new beat (instead of new measure), and pitch tokens. We intentionally design REMI‑like and JG‑like encoding so that they share the same structure and result in the same number of tokens for a given melody. There are, however, small, inevitable (and interesting) differences in that JG encoding provides the intra‑JG position, while REMI encoding provides the beat position of the note. According to the position labels shown in Figure 6, any of [0, 1, 4, 10, and 12] can correspond to beat position 0. However, in JG‑like encoding, each occurrence of position tokens limits the possibilities of subsequent ones. For instance, a position token of 0 implies that no more notes will occur in the same jeonggan, and, if the first note is 1, one or more additional notes should follow with values of 2–3 or 6–9. In other words, the position tokens of JG‑like encoding naturally embed the subdivisions of a given beat. In contrast, in REMI‑like encoding (as in almost all encoding based on Western notation),3 any offset value can follow a beat position of 0. To examine the impact of this position‑based logic on the generation process, we use REMI‑like encoding as our first baseline for comparison.
As a second baseline, we implement an ABC‑like encoding scheme that does not have a separate bar token and encodes each note as a combination of pitch and duration values. Note that we do not omit duration tokens that are equal to unit length as ABC encoding typically does, nor do we represent duration by combining several length tokens.
6.2 Piano roll‑like encoding
As explained in Section 4.3, we aimed to develop an MLM to transform 15th‑century melodies to current jeong‑ak style while preserving the original melodic contour. A key limitation of MLMs for generative tasks is that they require a fixed‑length sequence with predetermined positions for masked tokens. When using JG‑like encoding, where each note is represented by multiple tokens (2–3 tokens per note), this prevents flexible insertion of notes—the model can only generate a fixed number of new notes at specified positions. To overcome this, we use piano roll‑like encoding for the MLM, a technique widely employed in works on music generation with limited rhythmic patterns such as in Bach’s chorales (Huang et al., 2017; Liang et al., 2017). Here, each jeonggan is represented as six frames, with each frame consists of two channels: one channel for pitch or sigimsae with duration and the other for ornamentation.
7 Orchestral Part Generation
We implement an encoder–decoder transformer (Vaswani et al., 2017) model to generate melodies for different instruments based on a given instrument’s melody, leveraging the transformer’s ability to learn long‑term dependencies. The model consists of an encoder that processes the input sequence and a decoder for generating the output sequence. Our objective is to generate melodies that synchronize with the input melodies across musically corresponding phrases.
To encode the musical position of each token, we employed a “beat counter” that provides information about temporal position, similar to the approach in PopMAG (Ren et al., 2020). The musical position of each token is encoded as a combination of measure index, beat index, and sub‑beat index (“in‑jeonggan” position), as shown in Figure 8. This information is summed into note‑embedding. We found that the model failed to learn to accumulate position‑related tokens to parse the correct musical position of the symbol, leading to generation results that do not match with input length.

Figure 8
Orchestral part generation.
7.1 Experiment results
The sequence‑to‑sequence model takes four measures of melody, each from a randomly selected number of instruments, as input to the encoder. Then, given a target instrument condition, the model generates the corresponding four measures of melody for that target instrument.
Table 4 displays the results of the objective evaluation, specifically focusing on the geomungo and daegeum. The model was selected based on the best F1‑scores on the validation set. The inference was done with a temperature of 1.0 and top‑ sampling of 0.9. We aggregated 10 repetitive generations for each test sample. Details of training hyperparameters and evaluation metrics are explained in the supplementary materials 1 and 2.
Table 4
Quantitative evaluation results. “Geom.” denotes geomungo, “Daeg.” denotes daegeum.
| Piri to Geom. | Every to Daeg. | |||
|---|---|---|---|---|
| len‑mat | F1 | len‑mat | F1 | |
| JG‑like | 0.978 | 0.638 | 0.992 | 0.644 |
| REMI‑like | 0.997 | 0.624 | 0.992 | 0.580 |
| ABC‑like | 0.904 | 0.578 | 0.930 | 0.571 |
| JG with subgenre | 1.0 | 0.682 | 1.0 | 0.694 |
Overall, JG‑like encoding showed the best results in both instruments, except for some failures in length‑matching for the piri‑to‑geomungo inference. As REMI‑like and JG‑like encoding are identical when there is only one note in each jeonggan, the result in the geomungo generation was similar. However, when handling daegeum, which has more notes per jeonggan, JG‑like encoding clearly outperformed REMI‑like encoding. Overall, we conclude that JG‑like encoding is effective for modeling music notated in Jeongganbo.
As explained in Section 5.3, our dataset contains three distinctive subgenres of jeong‑ak, which correspond to different instrumental configurations. To evaluate the importance of these subgenres, we conducted an additional experiment that provides subgenre information as note features. Since we often provide only a subset of all instruments from each piece as input to the model during both training and inference, incorporating subgenre information can enhance the model’s understanding of the musical context. The quantitative results show that the model with subgenre information clearly achieves better performance. The performance gain was not solely due to providing more context during inference. For instance, when providing five instruments as conditioning to generate the daegeum part, this naturally indicates that the piece belongs to the six‑instrument subgenre. Even in such cases, F1‑scores increased from 0.608 to 0.637. This demonstrates that providing subgenre information helps the model capture the stylistic characteristics of jeong‑ak more effectively.
8 Melody Infilling with MLM
Our orchestral part generation method requires an initial input melody with a specified instrument. In other words, we need to transform the old melody for a specific instrument used in jeong‑ak. To accomplish this case of maintaining the outline of the original melody while achieving plausible transformation, we trained an MLM on our Jeongganbo dataset.
Following examples in MusicBERT (Zeng et al., 2021), we trained the bidirectional transformer encoder with MLM objective using various masking methods: (i) masking 5% of frames, (ii) replacing 5% of frames, (iii) masking 20% of note onsets, (iv) replacing 10% of note onsets, (v) erasing 10% of note onsets, (vi) masking the entire 6 frames of 15% of jeonggans, and (vii) masking 50% of ornamentations. The model employs piano roll‑like encoding as explained in Section 6.2.
Though the model can be trained to handle an arbitrary number of input instruments, we only trained the model with a single instrument, since the main intended usage of the model is to create variations of a single melody. We trained a 12‑layer model with the same dataset and hyperparameter settings as with the orchestration model.
9 15th‑Century Melody Transformation
9.1 Inference procedure
By combining the previous two models, we transformed Chihwapyeong and Chwipunghyeong into an ensemble of six instruments.
As explained in Section 2.2, we opted for a 60× ♪ span, which equals 20 jeonggans, or a 30× ♪ span, which equals 10 jeonggans. This pair corresponds to the rhythmic pattern of the Yeominlak movements 1–3 and 4–7, respectively.
Using the MLM, the modified melodies were seamlessly transformed into a piri melody. Piri, a double‑reed instrument known for its loud volume, was chosen as the main instrument for conveying the original melody due to its prominent role in contemporary jeong‑ak.
As the models were all trained on four‑measure chunks, we generate the full sequence of 512 or 132 measures using a moving window, providing two measures of previously generated output as teacher‑forcing inputs and generating one more measure for each four‑measure input. These were applied in a similar manner to both melody transformation and orchestral part generation. Once the melody is transformed into a piri melody, we feed it to the orchestral transformer to generate parts for five other instruments. We sequentially generate tokens for each instrument with the previously generated part as input. The final generation order is as follows: piri→geomungo→gayageum→ajaeng→haegeum→daegeum. After the initial generation, we refine each part by regenerating it with all five other parts as input. One of the generation outputs is shown in Figure 9.

Figure 9
Generated output of the daegeum part of Chwipunghyeong.
9.2 Expert reviews
Before finalizing the musical scores for performance by the Court Music Orchestra of the NGC, we conducted a briefing session with the orchestra members to explain the development process and invite feedback. In addition, our inference results were presented for expert evaluation.
The musicians expressed many positive opinions, such as “[the] genre‑specific rhythm and melodic flow were well‑represented” and “the generated pieces presented ornamentation techniques and melodic progressions specialized for each instrument.” At the same time, there were some clear errors at this point, including a few instances of notes that do not belong in the target scale and which fall outside the appropriate range for the given instrument. Clearly, performers had to alter or omit those notes to perform the piece. There were also cases where two types of sigimsae that cannot occur simultaneously appeared within the same passage. In such cases, the performers selected the ornamentation they considered more appropriate for the musical context. Finally, since sigimsae for the gayageum were not included in the OMR output, no corresponding ornamentations were generated. The performers, therefore, referred to the geomungo score to add suitable sigimsae during rehearsal.
Overall, the generated results were acknowledged to closely resemble the target style of Yeominlak. Thus, the Court Music Orchestra decided to play the pieces in a similar ensemble size to Yeominlak without further modification.
9.3 Interactive web application
We developed an interactive web application4 that provides users with hands‑on experience using our proposed generative model, as shown in Figure 10, inspired by pioneering examples like Bach Doodle (Huang et al., 2019) and folk‑rnn (Sturm et al., 2015). This web browser‑based platform allows users to input monophonic melodies and generate rich orchestrations in the distinctive style of jeong‑ak. The details of the implementation is described in supplementary material 3.5

Figure 10
Screenshot of the interactive web application.
10 Discussion
So far, we have described the system design and implementation from a technical perspective. In this section, having had the rare opportunity to apply MIR technology for expanding traditional music repertoire with an authoritative institution, we want to share some of the lessons learned and discuss the potential and limitations of music‑generation models in this context.
10.1 Practice of AI–traditional music collaboration
On May 14, 2024, the NGC’s Court Music Orchestra premiered the reconstructed versions of Chihwapyeong and Chwipunghyeong at Gyeongbokgung Palace in a performance commemorating King Sejong’s birth anniversary (Figure 11). The event was conceived in keeping with King Sejong’s creative spirit (which emphasized the fusion of science and art) and with the objectives of this project. A follow‑up showcase was held on June 2nd at the NGC’s pungnyu saranbang, where members of the research team and performers jointly presented the project’s conceptual foundation, technical methodology, and demonstrations of generated output, as shown in Figure 12. The event concluded with an open discussion on the interaction between traditional music and AI, drawing participation from approximately 80 attendees—including specialists in gugak, AI researchers, and members of the general public—who contributed to a dynamic and multifaceted dialogue, as shown in Figure 13.

Figure 11
Performance in commemoration of King Sejong’s birth anniversary. Photo courtesy of the NGC.

Figure 12
Technology and performance showcase. Photo courtesy of the NGC.

Figure 13
Open discussion session. Photo courtesy of the NGC.
This project represents an attempt to collaboratively integrate traditional musical practices with AI technologies, facilitated by several favorable conditions. First, the proactive participation of the NGC—a national institution combining research and performance functions— enabled the efficient integration of practical and administrative procedures such as concert planning, musician recruitment, and securing performance venues. The Gugak Center further supported the project by conducting surveys and promotional activities, enhancing public visibility and outreach.
Second, the research team was composed of researchers who possessed both traditional music performance and analytical capabilities while simultaneously having expertise in machine learning and MIR. These researchers played a crucial role in mediating between the Gugak Center’s artistic judgments and the technical team’s modeling constraints, as well as in designing encoding and learning strategies that reflected the structural specificities of Jeongganbo‑based music. Additionally, the ability to internally assess whether the generated results adequately captured the characteristics of jeong‑ak without requiring expert feedback at every iteration was also a significant advantage.
Third, the project’s success relied on adaptive collaboration between the research team and the NGC. While the NGC initially expected Lee Hye‑gu’s comparative Yeominlak materials to serve as the core dataset and was skeptical about using broader jeong‑ak data, these constraints proved technically limiting. Rather than simply declaring the approach infeasible, the research team first implemented the original requirements (Section 4), empirically demonstrating their limitations before proposing alternatives. This evidence‑based approach— showing concrete results rather than theoretical objections—facilitated the NGC’s acceptance of the revised methodology, ultimately leading to the significant architectural and methodological shifts necessary for the project’s success.
Ultimately, this project proposes a collaborative model that avoids the looming pitfall of subordinating traditional arts to technology. Instead, it strategically appropriates technology, centering on the interpretations and aesthetic judgments of the traditional music community. This framework may serve as a reference point for future research and digital reconstruction projects across diverse musical traditions and cultures.
10.2 Can AI‑generated music be incorporated into the gugak repertoire?
The question of whether music generated by AI can be incorporated into the gugak repertoire is a complex issue, the answer to which may vary depending on how such music is received and interpreted within the community.
Among gugak, jeong‑ak is a unique genre that has been recorded through Jeongganbo. Compared to other genres primarily transmitted orally, court music possesses a wealth of historical materials that allow for tracing the lineage and transmission process of its pieces. However, the way AI models generate music shows a fundamental difference from traditional methods of music creation and reconstruction, as its lineage is unclear and the traditional performance context is omitted. Key elements that have constituted traditional music—such as the historical origins of pieces, their sociocultural usage, and the presence of a performing community—can make it difficult for AI‑generated music to gain equal status with existing repertoire in the traditional music world, no matter how meticulously it reproduces musical structures and features. As pointed out by Huang et al. (2023), despite the advancements in AI systems, it may be premature from the perspective of the human performer community to expect AI to make direct, core contributions to the process of profound meaning‑making in music.
Nevertheless, the potential for AI‑generated music to be actively accepted within gugak clearly exists. This potential is rooted in gugak’s own history of evolution and reinterpretation (Clark, 2018). For instance, developments specific to gugak since the mid‑20th century illustrate this: while some of its genres have been transmitted with an emphasis on “preservation of original form” under the cultural heritage protection laws, there have simultaneously been continuous attempts in fields like contemporary gugak, fusion gugak, and educational and stage arts to reinterpret tradition through contemporary sensibilities and media. Within this trend, a tendency has emerged even within the gugak community to define and accept tradition flexibly according to changing times, rather than viewing it as a single, fixed entity.
In this context, this project, led by the NGC—an institution of national authority—can also be understood as an extension of the perspective that views traditional music not as a fixed museum piece but as a living artistic practice constantly reinvented according to the times and interpretations; such an endeavor itself plays a role in enhancing the possibility of institutional acceptance. This receptive attitude is partly confirmed by the on‑site survey results from the project’s presentation event held on June 2, 2024. In the survey of event attendees, which included 47 participants, 76.6% responded that “music generated by [AI] could be appropriate as traditional music,” and 92% replied that “projects utilizing [AI] in gugak and traditional music fields should be further expanded in the future.” Although the nature of the sample—event attendees—means we must consider the possibility that they represent an “early adopter” group with high interest in gugak and AI technology, this suggests that experts and the public hold a certain level of positive perception toward accepting AI‑generated music as part of tradition or using AI as a tool for the cultural dissemination of traditional music (see supplementary material 4 for detailed survey results).
This project serves as a case study experimentally exploring the incorporation of AI‑generated music into the gugak repertoire, highlighting both its potential and complexities. More importantly, it prompts the gugak community to reconsider questions about tradition, creation, and reconstruction in the context of AI.
11 Conclusion
Throughout this work, we explored how music‑generation models can reenergize ancient melodies, adapting them into new compositions that meet the style of current‑day Korean court music.
Venturing into relatively uncharted territory, we approached each step meticulously—from data curation and parsing to model architecture design—while carefully considering the unique nuances of the musical tradition. To enhance the quality of the generated outputs, we proposed a novel encoding framework and validated its effectiveness through objective and subjective measures. The Jeongganbo dataset and its conversion to Western staff notation in MusicXML is available online, along with other code of this project and a video recording of the performance.6
As detailed earlier regarding the project’s evolving objectives (see Section 2.2), the generative process of this project was guided by a series of specific musicological decisions and a defined target style. While this approach yielded the presented musical outcomes, it is crucial to recognize that this represented but one of many possible musicological and generative pathways.
The conditional variability of AI generation—where outcomes shift according to musicological assumptions and input design—opens up new directions for inquiry in traditional music research beyond simply creating music for performance. That is, AI does not offer definitive “answers” but rather contributes to deepening intellectual inquiry by facilitating the exploration of “possibilities” concretized within the researcher’s theoretical framework. This could inspire future studies to explore AI’s potential as an experimental tool for examining musicological hypotheses.
Our work establishes a successful collaborative model between AI and a prestigious music institution, where technology serves artistic direction. By bringing AI‑generated music to a public performance, this project initiated a crucial dialogue on the process of its reception and its potential place within an evolving tradition. We hope that this project contributes to moving closer to leveraging machine learning to make traditional music more accessible and enjoyable for modern audiences.
Acknowledgments
We sincerely appreciate the National Gugak Center and its staff who supported this project, including director‑general Kim Youngwoon (김영운), head of the research bureau Kim Myung‑suk (김명석), and Han Jungwon (한정원). We are deeply grateful to the musicians of the Court Music Orchestra for their invaluable contributions and efforts to vitalize our humble results.
Data Accessibility
The Jeongganbo dataset curated for this study is publicly available at https://www.danbinaerin.com/Jeongganbo_dataset/, including the original text‑based Jeongganbo encoding and corresponding MusicXML conversions. The generative modeling code and performance videos are available at https://github.com/MALerLab/SejongMusic. An interactive web demonstration is accessible at https://six-dragons-fly-again.site/, and the source code for the web application is available at https://github.com/crescent-stdio/six-dragons-fly-again-demo.
Funding Information
This research was also supported by the National R&D Program through the National Research Foundation of Korea (NRF) funded by the Korean Government (MSIT) (RS‑2023‑00252944, Korean Traditional Gagok Generation Using Deep Learning).
Competing Interests
MG and DJ serve as editors of this journal. They had no involvement in the review process or the decision to accept this manuscript for publication.
Authors’ Contributions
DH designed the encoding scheme, implemented the melody generation models, and conducted qualitative evaluation. MG contributed to the conceptual design of the generative framework and provided feedback on experimental directions and interpretation. DK designed and developed the jeong‑ak OMR dataset pipeline. HP designed and implemented the interactive web demonstration. SL conducted a quantitative evaluation and implemented Jeongganbo image‑synthesis procedures. JP coordinated the National Gugak Center restoration project and facilitated institutional planning and feedback throughout the research process. DJ supervised the overall project. All authors contributed to the discussion of the results and approved the final manuscript.
Notes
[2] Although jeong‑ak is often translated as Korean court music, it includes not only court music but also salon music and military music.
[3] This encoding reflects the original jeonggan encoding method, which is highly distinctive in this respect. Encoding music for machine learning currently involves extensive experimentation with encoding methods. Here is one example (among many) of how intercultural comparison can broaden the range of encoding approaches.
[5] The source code of the web demo is available at https://github.com/crescent-stdio/six-dragons-fly-again-demo.
