Skip to main content
Have a personal or library account? Click to login
Making Chant Computing Easy: CantusCorpus v1.0 and the PyCantus Library Cover

Making Chant Computing Easy: CantusCorpus v1.0 and the PyCantus Library

Open Access
|Apr 2026

Figures & Tables

Figure 1

Liturgy defines positions for chant in each day of the year. Different communities of practitioners might, however, use different chants in some of these positions. Each manuscript, therefore, documents how a particular ecclesiastical community placed chants in the liturgy. This assignment is communicated via rubrics, instructions in red ink that indicate specifics such as feasts, services during the day or indications of genre. Comparing chants in the same liturgical position in multiple manuscripts reveals some variety: the Invitatory antiphon (rubric parts highlighted by pink rectangles) for the feast Vigilia Nativitatis Domini (rubric parts highlighted in blue), with chant Hodie scietis in (a), (c) and (d); Levate capita vestra in (b), Prope est jam dominus in (e); and Christus adveniet nobis in (f).

Figure 2

A simplified schema of our contribution.

Figure 3

Cataloguing chant in the Cantus ecosystem. An expert identifies chants in a manuscript (left panel), and creates their database records (middle panel, top). The key step in creating a chant record is assigning the Cantus ID: identifying which unit of chant repertoire is on the page (pink process). Besides transcribing the text, liturgical expertise is needed, as one must correctly interpret abbreviated notes in the manuscript – rubrics – to identify the liturgical position and function of the chant (dark green process); together with the text of the chant, this allows one to select the correct Cantus ID among the ‘master records' in the Cantus Index (right panel, top). A link to the source record (middle panel, bottom) and page (folio) within the source is added (light blue process). Once a record with a Cantus ID is added to a database in the Cantus ecosystem, the Cantus Index federated search mechanism (right panel, bottom) will discover the record (dark purple process). Descriptions of individual fields mentioned in the figure can be found in Tables 1 and 2. (Screenshots from the given URLs have been adjusted for readability.)

Table 1

Chants fields overview. The asterisk (*) indicates required fields.

FieldDescription
chantlink*URL link directly to the chant entry in the external database. Unique ID.
incipit*The opening words of the chant.
cantus_id*The Cantus ID associated with the chant (e.g. 007129a).
modeMode of the chant.
siglum*Abbreviation for the source manuscript or collection (e.g. A‑ABC Fragm. 1), ideally RISM.
positionOrder of the chant in the office (first, second, etc.).
folio*Folio information for the chant.
sequenceThe order of the chant on the folio.
feastFeast or liturgical occasion when the chant is used.
feast_codeAdditional identifier unifying feasts with multiple spellings. The values are meaningful in Cantus Index.
genreGenre of the chant, such as antiphon (A), responsory (R), etc.25
officeThe liturgy in which the chant is used, such as Matins (M) or Lauds (L).
srclink*URL link to the source in the external database.
melody_idThe Melody ID associated with the chant (e.g. 001216m1). Rarely used.
full_textFull text of the chant.
melodyMelody encoded in Volpiano.
db*Abbreviation of the source database.
imageURL link to an image of the manuscript page.
Figure 4

Overview of support for source metadata among Cantus Index database front‑ends. Lightest green indicates support under a differently named field; darkest green indicates fields that were selected to be included in CantusCorpus v1.0.

Table 2

Sources fields overview. The asterisk (*) indicates required fields.

FieldDescription
titleManuscript name (may use siglum).
siglum*Abbreviation for the source manuscript, possibly RISM.
centuryText identifying the century of the source.
provenancePlace of origin or use of the source.
srclink*URL link to the source in the external database. Unique ID.
cursusSecular or Monastic cursus of the source.
num_centuryInteger representation of a century.
Table 3

Basic quantitative values of the chants part of the CantusCorpus v1.0 dataset.

Chant records in chants.csvNumber
All888,010
With Volpiano melody60,588
With Volpiano melody of 20+ notes44,625
Table 4

Basic quantitative values of the sources part of the CantusCorpus v1.0 dataset.

Source records in sources.csvNumber
All2,278
All with 100+ chants508
Those with provenance value1,606
Those with century value2,240
Those with cursus value345
Table 5

Overview of data distribution among source databases. The symbol # is used as an abbreviation for ‘number of’. Abbreviations of database codes can be found in Subsubsection 4.1.1. The column annotated with # sources (100+) contains the number of sources with more than 100 chant records associated with them.

Source DB code# chants# CIDs# unique CIDs# sources# sources (100+)
CD429,98230,35014,662231166
MMMO212,23117,4797,503426151
CSK22,5397,20121254212
FCB36,1037,8895343029
CPL30,4337,6661432717
PEM32,7389,18453830525
SEMM104,67823,10311,62548781
HCD11,2785,37454109
A4M2,7382,006121423
HYM5,2906803238320
Figure 5

Simplified schema of the PyCantus data model (‘content’ attributes only). Full UML model can be found in Supplementary File S2.

DOI: https://doi.org/10.5334/tismir.321 | Journal eISSN: 2514-3298
Language: English
Page range: 164 - 178
Submitted on: Jul 1, 2025
Accepted on: Jan 31, 2026
Published on: Apr 29, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Anna Dvořáková, Tim Eipert, Debra Lacoste, Jan Hajič jr, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.