Have a personal or library account? Click to login
A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction Cover

A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction

Open Access
|Feb 2024

Figures & Tables

tismir-7-1-166-g1.jpg
Figure 1

Conceptual overview of SVR from LM signals. Time-frequency representations of an exemplary LM signal and the corresponding reconstructed signal are depicted in red and grey, respectively.

Table 1

Overview of the songs and takes in LM-SSD. C1 and C0 represent the number of takes with and without crosstalk, respectively. Songs marked with * have German lyrics.

IDSONG NAMEORIGINAL ARTISTSINGER IDTAKES C1LM-A C0TAKES C1LM-B C0DURATION (MM:SS)
AAAll AloneMichael Fast1M121227:03
TSThe ScientistColdplay1M121221:37
YFYour FiresAll The Luck In The World1M121224:21
DLDezemberluft*Heisskalt2M121214:47
BBBooks From BoxesMaxïmo Park2M121217:39
NBNarben*Alligatoah2M121211:47
SGSupergirlReamonn3F, 1M121226:34
OCOne Call AwayCharlie Puth3F, 1M121219:32
PLPast LifeTrevor Daniel & Selena Gomez3F, 1M121217:45
CCChasing CarsSnow Patrol4F121228:10
BTBreakfast At Tiffany’sDeep Blue Something4F121222:16
LLLittle Lion ManMumford & Sons4F121219:06
Total12241224250:37
tismir-7-1-166-g2.jpg
Figure 2

Photograph of the recording setup (top) and detailed depiction of the LMs used (bottom). LM-A: Albrecht AE-38-S2a larynx microphone; LM-B: self-made larynx microphone with TE Connectivity CM-01B sensor; CM: close-up microphone (Neumann U87); GP: guitar pickup (AMG Electronics C-Ducer); GL/GR: guitar stereo left/right (AKG C414).

tismir-7-1-166-g3.png
Figure 3

Relative transfer function (RTF) estimates w.r.t. CM for LM-A (top) and LM-B (bottom). RTF estimates for individual singers are shown in grey (1M: solid, 2M: dashed, 3F: dotted, 4F: dash-dotted). The black line indicates the mean RTF across singers for each LM model.

tismir-7-1-166-g4.png
Figure 4

Coherence estimates w.r.t. CM for LM-A (top) and LM-B (bottom). Coherence estimates for individual singers are shown in grey (1M: solid, 2M: dashed, 3F: dotted, 4F: dash-dotted). The black line indicates the mean coherence across singers for each LM model.

Table 2

Dataset dimensions and naming scheme.

FIELDDESCRIPTIONVALUES
UIDUnique numerical identifier for a take across songs001 – 072
SongIDTwo-letter abbreviation of the songcf. Table 1
TypeMicrophone type or mix settingLM-A, LM-B, CM, GP, GL, GR, MixA, MixB
CrosstalkWhether guitar crosstalk is present on CM (C1) or not (C0)C1, C0
SingerSinger identifier (with gender)1M, 2M, 3F, 4F
TakeTake number for the given song (T1-T3 use LM-A, T4-T6 use LM-B)T1 – T6
tismir-7-1-166-g5.png
Figure 5

Architecture of the DDSP-based baseline system. Blue color is used for differentiable DSP building blocks, yellow color for NN building blocks with learnable parameters, and white color for fixed pre-processing steps. Control parameter flow is denoted with dashed line arrows, while solid lines indicate flow of audio signals. The spectrograms show signal content at the indicated position in the signal flow diagram. The shown example uses an excerpt from the LM-B signal of song AA T5 as the input signal xLM and a corresponding model trained with the OF scenario (see Section 6).

Table 3

Word error rate (WER) of lyrics transcription with the Whisper (Radford et al., 2022) medium model for a selection of songs from LM-SSD. Song DL uses the dedicated German Whisper model.

WER (%)
SONGSINGERCMLMOFDTDS
AA1M1.8372.561.8322.5620.73
TS1M2.8231.692.8224.6533.10
DL2M2.1610.812.705.957.03
BB2M7.4011.259.6511.9021.22
SG3F3.7011.115.7611.1157.61
OC3F3.3184.304.9612.4058.68
CC4F0.4992.650.4929.4191.67
LL4F1.9885.711.9815.8769.44
Average3.2749.054.2515.8946.13
tismir-7-1-166-g6.png
Figure 6

Listening test results according to stimulus and singer ID. LM: Larynx Microphone; NA: Naive Approach (linear filtering); OF, DT, DS: Overfitting, Different Take, and Different Song training scenarios; HR: Hidden Reference (CM signal).

DOI: https://doi.org/10.5334/tismir.166 | Journal eISSN: 2514-3298
Language: English
Submitted on: Mar 10, 2023
Accepted on: Jan 6, 2024
Published on: Feb 23, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Simon Schwär, Michael Krause, Michael Fast, Sebastian Rosenzweig, Frank Scherbaum, Meinard Müller, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.