Have a personal or library account? Click to login
The Billboard Melodic Music Dataset (BiMMuDa) Cover

Figures & Tables

Table 1

Summary of existing symbolic datasets of popular music.

Dataset NameSummaryContentsAvailable MetadataManually Created/Reviewed?
BiMMuDaMelodic transcriptions of the top five songs on the Billboard year-end singles chart from 1950 to 20221,133 single-track MIDI files and 371 MuseScore filesExtensive metadata per song and song sectionYes
POP909Transcriptions of popular Chinese songs909 multitrack MIDI files and annotationsSong title, artist, and key signature informationYes
Lakh MIDI170k files scraped from the Internet176,581 multitrack MIDI filesNo consistent metadataNo
RWC80 original songs in the style of Japanese popular music, plus 20 songs in the style of Western popular music100 multitrack MIDI filesSong title and length, artist, tempo, and instrumentationYes
CoCoPopsMelodic transcriptions of a random sample of the Billboard Hot 100 from 1960 to 2010200 HumDrum files (project still ongoing)Song title, artist, and yearYes
Table 2

BiMMuDa and CoCoPops transcription differences.

Song IDTitleArtistDissimilarity ValueTranscription Differences
1959_01The Battle of New OrleansJohnny Horton4.4The song’s swing is encoded in BiMMuDa transcription but not CoCoPops’. Disagreements on pitches in bar five of the verse and at the end of melodic phrases in the chorus and bridge.
1963_04He’s So FineThe Chiffons5.3CoCoPops includes more rhythmic and pitch detail in the final bars of the verse. Disagreement on pitches in bar fifteen, beat one of the chorus.
1964_03Hello, Dolly!Louis Armstrong3.3The vocals have very expressive timing, leading to different interpretations of the rhythms.
1968_04(Sittin’ On) The Dock of the BayOtis Redding4.3CoCoPops includes more pitch detail in the verse.
1969_04Honky Tonk WomenThe Rolling Stones1.7Occasional pitch disagreements throughout. The CoCoPops transcription tends towards accidentals when the vocals are slightly flat.
1970_05WarEdwin Starr2.9BiMMuDa leaves out the vocalist’s adlibs between the main phrases, while CoCoPops includes them.
1973_05My LovePaul McCartney and Wings4.5The two transcriptions have different interpretations of the expressive timing in the bridge.
1978_01Shadow DancingAndy Gibb4.7Good agreement throughout, besides small differences in the repetitions of sections.
1980_02Another Brick in the Wall, Part IIPink Floyd6.4The pitches in BiMMuDa transcription are one octave higher than those in the CoCoPops transcription.
1983_05Beat ItMichael Jackson4.0Main melody identification disagreements at the end of the chorus.
1984_02What’s Love Got to Do With ItTina Turner7.3CoCoPops’ transcription includes much more pitch detail in the verse and chorus.
1985_03Wake Me Up Before You Go-GoWham!4.3BiMMuDa transcription encodes the swing, while CoCoPops does not.
1988_03Got My Mind Set On YouGeorge Harrison3.9Slightly different interpretations of the rhythms throughout; CoCoPops tends towards triplets while BiMMuDa uses an eighth note followed by two sixteenth notes.
1990_01Hold OnWilson Phillips3.2Good agreement throughout, besides small differences in the repetitions of sections.
tismir-7-1-168-g1.png
Figure 1

Dissimilarity matrix between BiMMuDa and CoCoPops transcriptions.

Table 3

Number of songs, melodies, minutes, and note events in BiMMuDa, per decade and in total.

SubsetSongsMelodiesMinutesNote Events
1950–19595212953.595,342
1960–19695012847.434,883
1970–19795215862.146,956
1980–19895015254.506,485
1990–19995214959.566,885
2000–20095018570.0811,115
2010–20226523280.7813,562
Total3711133428.0855,258
Table 4

Description of attributes per song in BiMMuDa.

AttributeDescription
TitleTitle of the song
ArtistArtist(s), including any featured artists
YearYear in which the song appeared in the top five of the Billboard year-end singles chart
PositionSong’s position on the Billboard year-end singles chart
Tempo one–threeTempo of the song in beats per minute, as estimated by Tunebat. BPM one is the starting tempo, while BPMs two and three account for up to two tempo changes.
Link to AudioSpotify or YouTube link to the song
Tonics one–sixTonic of the song, as estimated by Tunebat. Tonic one is the tonic at the beginning of the song, while Tonics two–six account for up to five key changes.
Modes one–sixMode (major/minor), as estimated by Tunebat. Mode one is the mode at the beginning of the song, while Modes two–six account for up to five mode changes.
Number of PartsNumber of melodies in the song
Number of WordsNumber of words in the lyrics file, including repeated words and sections
Number of Unique WordsNumber of unique words in the lyrics file
Unique Word RatioNumber of unique words in the lyrics divided by the total number of words
Number of SyllablesNumber of syllables in the lyrics file
Table 5

Summary statistics for BiMMuDa per-song attributes.

AttributeMeanMedianStd Dev.Range
Number of Parts3.123.001.130–8
Tempo one105.72104.0024.7657–174
Number of Words335.49303.00169.5912–896
Number of Unique Words104.2593.0047.1011–312
Unique Word Ratio0.350.330.120.10–1.00
Number of Syllables413.17372.00208.5857–174
tismir-7-1-168-g2.png
Figure 2

Correlation matrix for per-song attributes.

tismir-7-1-168-g3.png
Figure 3

Mean number of melodies per song, overall and by decade, with error bars denoting standard deviations.

tismir-7-1-168-g4.png
Figure 4

Frequency of major and minor modes by decade.

tismir-7-1-168-g5.png
Figure 5

Distribution of tonics, overall and by decade.

tismir-7-1-168-g6.png
Figure 6

Distribution of tempos (BPM), overall and by decade.

tismir-7-1-168-g7.png
Figure 7

Means of the Number of Words, Number of Unique Words, Unique Word Ratio, and Number of Syllables attributes, overall and per decade, with error bars denoting standard deviation.

Table 6

Attribute descriptors per melody in BiMMuDa.

AttributeDescription
IDUnique identifier and filename of melody (e.g., “1960_01_1”)
LengthLength of the MIDI file in seconds
Number of Note EventsNumber of Note Events in the melody
Section LabelThe melody’s function regarding the global structure of the song (e.g., verse, chorus)
TonalityDegree of conformity to one of the 24 keys in Western music, as determined by the Krumhansl-Schmuckler key-finding algorithm (Krumhansl, 1990). The algorithm outputs the key most highly correlated with the melody, with the correlation coefficient representing the degree of conformity to the key.
Melodic Information Content (MIC)Information-theoretic unpredictability of the melody’s pitches according to a probabilistic model of auditory expectation (Pearce, 2018, 2005). Information content values of pitches are computed step-wise and then averaged.
Melodic Interval Size (MIS)Average distance in semitones between consecutive pitches
Pitch STDStandard deviation of the melody’s pitches
Onset DensityAverage number of notes per second
Normalized Pairwise Variability Index (nPVI)Durational contrast between consecutive notes (Patel and Daniele, 2003)
Rhythmic Information Content (RIC)Information-theoretic unpredictability of the melody’s rhythmic structure according to the model of Pearce (2018, 2005). Information content values of onset times are computed step-wise and then averaged.
Table 7

Summary statistics for BiMMuDa per-melody attributes.

AttributeMeanMedianStd. DevRange
Length22.6820.719.672.29–65.26
Number of Note Events48.744424.144–168
Tonality0.730.750.100.41–0.98
MIC3.543.490.990.28–6.04
Pitch STD2.992.871.140.00–9.72
MIS2.112.090.840.00–0.85
Onset Density2.252.130.840.44–5.69
nPVI40.7139.2119.790.00–128.74
RIC2.232.160.750.23–7.06
tismir-7-1-168-g8.png
Figure 8

Annual averages of per-melody attributes. The time series are smoothed with a two-forward, two-backward averaging window to make trends more visible, so there are no values for the years 1950, 1951, 2020, and 2021.

tismir-7-1-168-g9.png
Figure 9

Correlation matrix for per-melody attributes.

tismir-7-1-168-g10.png
Figure 10

Annual averages of the Length and Number of Note Events attributes. The time series are processed the same way as in Figure 8.

tismir-7-1-168-g11.png
Figure 11

Distribution of section labels, overall and per decade. The labels Pre-Chorus, Post-Chorus, Outro, Intro, Break, and Hook are aggregated into the Other category due to their relatively low frequencies.

Table 8

Results of t-tests between attributes per melody of verses and choruses.

AttributeMean (Verses)Mean (Choruses)p-value
Length24.6922.28
No. of Note Events56.3646.90
Tonality0.750.730.061
MIC3.483.520.61
Pitch STD2.972.960.90
MIS2.022.15
Onset Density2.382.15
nPVI39.6840.860.41
RIC2.212.240.69
DOI: https://doi.org/10.5334/tismir.168 | Journal eISSN: 2514-3298
Language: English
Submitted on: May 30, 2023
Accepted on: May 15, 2024
Published on: Aug 7, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Madeline Hamilton, Ana Clemente, Edward Hall, Marcus Pearce, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.