
Figure 1
Dagstuhl ChoirSet—an overview.
Table 1
Comparison of polyphonic singing datasets described in Section 2. The reported durations refer to the total recording duration (not counting multiple tracks per recording if available).
| Name/Author | Multitrack | Annotations | Publicly Available | # Recordings | Duration (hh:mm:ss) |
|---|---|---|---|---|---|
| Su et al. (2016) | No | MIDI | On Request | 5 excerpts | 00:02:11 |
| Barbershop Quartets10 | Yes | MIDI | No | 22 songs | 00:42:10 |
| Bach Chorales11 | Yes | MIDI | No | 26 songs | 00:58:20 |
| Scherbaum et al. (2019) | Yes | – | On Request | 216 songs | 06:04:40 |
| Erkomaishvili Dataset | No | Structure, F0, Score, Onsets | Yes | 101 songs | 07:05:00 |
| (Rosenzweig et al. 2020) | |||||
| Choral Singing Dataset (CSD) | Yes | MIDI, F0, Notes | Yes | 3 songs | 00:07:14 |
| (Cuesta et al., 2018) | |||||
| Dagstuhl ChoirSet (DCS) | Yes | MIDI, F0, Beats | Yes | 2 songs, exercises | 00:55:30 |

Figure 2
Anton Bruckner, Locus Iste WAB 23 (measures 1 to 11). The score was obtained from CPDL and edited by Brian Marble.13
Table 2
Overview of the audio recordings in DCS. The third column indicates the number of takes available for each piece and the last column refers to the total duration of all takes together.
| Piece | Setting | # Takes | Duration (mm:ss) |
|---|---|---|---|
| Locus Iste | Full Choir | 3 | 07:22 |
| Quartet A | 7 | 16:26 | |
| Quartet B | 6 | 14:02 | |
| Tebe Poem | Full Choir | 5 | 05:27 |
| Quartet A | 2 | 02:30 | |
| Exercises | Full Choir | 33 | 06:00 |
| Quartet A | 25 | 03:43 | |
| Total | 81 | 55:30 |

Figure 3
Microphone setup for one singer.

Figure 4
Comparison of LRX and DYN signals from a tenor singer. Excerpts correspond to the marked Locus Iste passage in Figure 2. (a) Magnitude spectrograms. CREPE F0-trajectories are plotted on top in the respective colors. (b) Smoothed CREPE confidence. (c) Binarized trajectory activations obtained by thresholding smoothed confidence (LRX threshold: 0.935, DYN threshold: 0.9).

Figure 5
Screenshot (detail) of digital audio workstation (Logic Pro X) with multiple tracks.
Table 3
DCS dimensions.
| Dimension | Shortcut | Meaning |
|---|---|---|
| Song | LI | Locus Iste |
| TP | Tebe Poem | |
| SE | Systematic Exercises | |
| Setting | FullChoir | Full Choir Setting |
| QuartetA | Quartet A Setting | |
| QuartetB | Quartet B Setting | |
| Take | Take | Take Number |
| Voice | S | Soprano |
| A | Alto | |
| T | Tenor | |
| B | Bass | |
| Stereo | Stereo Mic | |
| StereoReverb | Stereo Mic Reverb | |
| Microphone | LRX | Larynx Mic |
| DYN | Dynamic Mic | |
| HSM | Headset Mic | |
| STR | Stereo Mic R | |
| STL | Stereo Mic L | |
| STM | Stereo Mic L+R |
Table 4
Evaluation results for pYIN trajectories averaged over two quartet recordings.
| Mic | VR | VFA | RPA | RCA | OA |
|---|---|---|---|---|---|
| LRX | 0.99 (0.00) | 0.11 (0.06) | 0.95 (0.02) | 0.95 (0.01) | 0.93 (0.03) |
| HSM | 0.98 (0.01) | 0.33 (0.09) | 0.81 (0.10) | 0.91 (0.04) | 0.77 (0.08) |
| DYN | 0.99 (0.00) | 0.16 (0.11) | 0.93 (0.04) | 0.95 (0.01) | 0.90 (0.05) |
Table 5
Evaluation results for CREPE trajectories averaged over two quartet recordings.
| Mic | VR | VFA | RPA | RCA | OA |
|---|---|---|---|---|---|
| LRX | 0.96 (0.01) | 0.12 (0.02) | 0.96 (0.01) | 0.96 (0.01) | 0.93 (0.02) |
| HSM | 0.92 (0.02) | 0.32 (0.08) | 0.91 (0.01) | 0.91 (0.02) | 0.84 (0.02) |
| DYN | 0.93 (0.01) | 0.18 (0.07) | 0.93 (0.01) | 0.93 (0.01) | 0.90 (0.02) |

Figure 6
Averaged intonation cost (IC) measures for six takes of Locus Iste by Quartet A and five takes by Quartet B. The local standard deviations are indicated in light grey.

Figure 7
Multiple-F0-estimation using DeepSalience (Bittner et al., 2017) with a threshold of 0.1. (a) Estimation results (excerpts) for the mix of DYN signals and the STM signal with reverb. (b) Evaluation metrics for all scenarios.
