Wherever I Lay My Hat is Home?: A Complex Case Study of Crowd-Sourcing, Coordination, and Cross-Platform Integration for Hosting Open Humanities Data

Mark Gotham

doi:10.5334/johd.456

Full Article

1 Context and Motivation

As Marvin Gaye once sang, ‘Wherever I Lay My Hat, That’s My Home’. This is a recurring trope in poetry and song. Later songs include covers of the Gaye (by Paul Young and others), those using the ‘lay my hat’ trope and near variants on it,² and even a degree of intertextual reference between these songs.³ Earlier instances of a similar trope stretch back into history, not least with the ‘wanderer’ character: a favourite across the artistic ages. Among the examples most relevant to the data under discussion here is Wilhelm Müller’s 1821 ‘Wanderschaft’ as set to music in songs by Schubert among others.⁴ The wanderer character is typically ‘free’ in the sense of being relatively unattached to people, property, and other trappings of connected existence. However we may feel personally, those committed to ‘free’ data in the humanities typically take that word to mean something at the other extreme, adopting the ‘FAIR’ principles which emphasise (inter alia) maximal connectedness (Wilkinson et al., 2016).

As all involved in creating, curating, and maintaining open humanities data (hereafter OHDers) know all too well, this is not so easy to achieve. More specifically, it is easy to host data somewhere that is open in principle, but which falls short of making open in practice for all target users, particularly when those users include groups beyond academia, as is typical of arts, humanities, and cultural heritage data.⁵ This discussion paper presents the complex case of OpenScore’s efforts to make musical cultural heritage data more openly accessible and useable for academics, musicians and beyond. While aspects of this story are specific to the given case study, most of the considerations are widely relevant and instructive for all OHDers involved in meaningful development of open data.

1.1 OpenScore at MuseScore: Open Software, Closed Platform

‘OpenScore’ is an effort to digitise and freely distribute public domain sheet music under the maximally permissive ‘CC0’ licence. All readers will recognise that text files in machine readable formats (e.g., ‘.txt’) support wider use cases than images of the same (e.g., ‘.pdf’). The enhanced utility of editable music files over static images is even starker, with additional utility including: playback with synthetic audio, transposition to any key, muting one or more parts (for ‘music minus one’ practice), and more.

While some OHDers may start projects with a relatively ‘free hand’, our story begins already immersed in the world of platforms. OpenScore began in 2017, initiated by the then owners of ‘MuseScore’ which is at once a music notation app and a website/platform for sharing scores.⁶ The notation app is free, open source software released under a licence that guarantees it will remain so; the same is not true of the website-platform. Both have seen major changes over the years since 2017, not least with the changing ownership of MuseScore as a commercial entity and the establishment of ‘MuseGroup’.⁷

In this initial phase of OpenScore, specific items of sheet music were chosen by sponsors of a crowdfunding campaign, transcribed by MuseScore users (volunteers incentivised by one month fee waiver for ‘pro’ access to this freemium web platform per item), and reviewed by the OpenScore manager, Peter Jonas (also at MuseScore). Peter is the only point of continuity from that initial effort, both at OpenScore, and probably also at MuseScore as a commercial entity. The last surviving web presence of this initial phase is internal to MuseScore; all other links have now expired.⁸ Both these cases (of personnel and links) are salutatory reminders of how quickly things can change and the precarity of data maintenance for all, OHDers and otherwise.

1.2 OpenScore takes an academic turn

A year later, that initial phase was slowing down. Transcribers were keen but other aspects were not keeping pace: having Jonas as sole reviewer was not sustainable and the crowdfunding was running out. As such, it was time for a new approach. The present author (Gotham) approached the problem from a perspective more familiar to OHDers: supporting public-benefit work from a starting point in academia, including the frame of ‘academic impact’. Gotham and Jonas had been in discussion about this topic since before the beginning of OpenScore and set out to collaborate on experimenting with the new approach together.

We aimed to take the best parts of OpenScore’s workflow to date, and adjust those that were less effective. Initial work on OpenScore had positively established some brand recognition and also identified a set of keen and competent individuals ready to contribute to the project on the given terms. Gotham sought and gained academic seed funding from the University of Cambridge for the new venture which was ultimately to become the ‘OpenScore Lieder Corpus’. As will be clear from the name, this phase introduced a repertoire focus on songs from the long nineteenth-century. The choice of this repertoire was based on several key considerations as discussed in previous papers which do not need re-exposition here.⁹ What is relevant to the story here is the continued centrality use of the MuseScore platform to OpenScore’s operations at this stage, and the emerging effort to traverse different use cases.

The case for funding was made in relation to a wide range of prospective users who would benefit from editable and adaptable scores. This included academics conducting data science research on the musico-cultural heritage encoded in those scores as well as musicians performing them. From the perspective of academic work this later group perhaps falls under the banner of ‘impact’. From any perspective, having multiple use cases for the same data strengthens the case for a project.

Targetting multiple use cases is laudable, but it is also complex. Some academics are quick to point out that different use cases call for attention to different aspects of the same data. This is as clearly true of musical data as in any domain. For example, musicians and musicology-oriented use cases prioritise fidelity to the source. Some academic-computational tasks, by contrast, prioritise aspects such as strictly reliable alignment between sources which require pragmatic adjustments to those sources that are unpalatable to the former group. This consideration is part of the OpenScore lieder data: while the version of record data is our focus here and reflects the ‘musical’ priority, research use cases in machine learning have required sufficient adjustment to further require the splitting off of an altered (near duplicate) dataset.¹⁰ There are real issues here, without easy answers. At the same time, the protestation that a dataset must be created with exactly one research question in mind is often overstated. One key, clear motivation for creating datasets is to support multiple use cases.¹¹ The importance of this to the field is recognised by efforts to make datasets interoperable, notably the ‘MIR data’ project (Bittner et al., 2019).

1.3 Incentives and Support

In this first phase of the ‘lieder corpus’, we experimented with an adapted personnel structure and workflow. Transcribers were recruited and incentivised in the same way, this aspect having proven successful to date. Jonas now shared management with Gotham, and we recruited reviewers externally, to be paid directly from the grant. Most of this proved more effective and broadly remained for subsequent phases of OpenScore. The exception is the case of ‘reviewers’; later phases gradually settled on recruitment of reviewers from among transcribers. All good transcribers are invited to become reviewers and are incentivised in the same way (per item). Some individuals decide to accept, others prefer to remain transcribing only.

As for funding, the funded period of the lieder corpus was brief. Later funded periods of OpenScore efforts have followed a similar approach of paying some reviewers and/or corpus managers (never Gotham or Jonas), but these are rare, brief moments. The majority of review work has been rewarded on the same basis as transcribers: pro membership. It is far from self evident that this ‘perk’ is a meaningful incentive overall. While some clearly and explicitly appreciate it, it seems clear that the main incentive overall is contribution to the OpenScore effort and the associated values.

The ‘OpenScore Lieder Corpus’ continued to grow, extending far beyond the initial phase thanks to the volunteer efforts of all involved. When that initiative started to show signs of slowing down, we embarked on a second corpus of string quartets (reflecting the interest of the community at that time), and then orchestral works.¹² At the time of writing, the Lieder Corpus is having a renewed lease of life, in parallel with the quartets. It is important to emphasise that these changes reflect the priorities of those keen to participate: the community contributed not only scores, but also to the direction.

We set out the above history in some detail for two reasons. First, it is highly relevant to the story of platform migration which follows below; second, it is important to be realistic about support for such initiatives, particularly for OHDers reading this who may be planning similar initiatives. Overall, external support for this initiative has centred on the following:

Peter Jonas’ time has been supported by MuseScore more or less formally as part of his work there. This is welcome support, though it is clearly (and understandably) not a commercial priority for the company, so Jonas has to make the case for it and there are no guarantees it will continue, particularly as the ownership, management and priorities change.
Mark Gotham justifies his time on this project through academic means, including via articles that present the data for research use cases, and by conducting such research.¹³ Broadly speaking, this justifies the time to a limited extent; there is nothing wrong with using academic time in this way, but it is far from what academic structures tend to incentivise.
Occasional academic funding, sought by Gotham and supplied to those leading the management of individual corpora. Please see the acknowledgements (section 5) for more details.
In-kind support for transcribers and reviewers from MuseScore in the form of pro membership.

We are of course very grateful for all forms of financial and in-kind support. At the same time, it is clear and worth emphasising that the real powerhouse behind this initiative is the commitment of those involved, predominantly as volunteers. This includes the many transcribers, reviewers and contributors, and even Gotham and Jonas, none of whom is really incentivised to do this for any substantial reason beyond their internal belief that the project is worthwhile – even important – as a mechanism for modernising cultural heritage. We are mindful that this is a precarious set up. Then again, the project is still growing, and that precariousness is not unusual for digital projects around cultural heritage.

2 A Journey Across Platforms

All OpenScore corpora have begun with crowd-sourcing musical source data. Crucial to this first phase is the cultivation of a community. As the task requires some specialist interest and knowledge of Western musical notation and of the specific notation application we use, it makes sense to start where such a community already exists, and from which members interested in contributing to an open data effort can be identified.

2.1 MuseScore.com

The choice of MuseScore has been broadly effective, however we have to distinguish between the ‘two MuseScores’ here. The desktop software (now called ‘MuseScore Studio’) continues to be free and open source. This is the app we use to transcribe (create data). The online platform (MuseScore.com) serves as the commercial arm of MuseScore. We traverse both in that we create scores using the desktop software and store/present them on the online platform. As has been anecdotally reported by other crowd-sourced initiatives, we have found the platform effective as a starting point whereby the large community serves not so much to create a correspondingly large team of contributors, but rather a large number of users and a much smaller group of highly enthusiastic participants.

As a commercial operation, the online platform is naturally driven by complex decision-making with commercial concerns front-and-centre, and details of those criteria not made known to the public (including us). Suffice to say, although we have had a connection to the company from the start, we do not have any influence in that kind of decision-making, and we do know from users that these decisions affect their readiness to participate in an effort like OpenScore. One clear trend is the gradual change to the platform’s freemium model, which has made accessing scores hosted there increasingly difficult, especially for those without a pro account (and therefore payment).

We do not know the relative importance of the two MuseScores (platform and app) to this venture. The distinction between the two MuseScores is central to story of platform migration here, notably with a continued use of the app and a gradual independence from the platform through mirroring data elsewhere. We set out the context and principles for these choices, but cannot easily compare the effect of such a change, and nor can we know whether an entirely different starting platform might have been more effective.

2.2 GitHub.com

Given the changes to the MuseScore platform mentioned above, it seemed clear to that to honour the core value of ‘openness’ at the heart of OpenScore, we would have to arrange a secondary hosting of the data not suject to those controls. We arranged (openly, and in collaboration with MuseScore) to create a GitHub mirror of the data, and enhanced this (manually) with a degree of linked open data, including linking to (and in many cases creating) Wikidata entries for the composers involved. As such, GitHub is the second platform in our story.¹⁴

GitHub is highly popular with computational users, including as that group intersects with OHDers.¹⁵ Computational benefits include the relative ease of managing complex, distributed curation with customisable automation (including via continuous integration). For a discussion and example use case in this field of musical scores, see Hentschel et al. (2025). This functionality is of high importance to those computational OHDers charged with managing datasets. Any benefits this may bring to non-computational users are more secondary, complex, and typically determined by what the computational maintainers do with the data. This is a topic to which we return continually in this paper.

GitHub is also a platform and commercial entity, owned in this case by Microsoft. It may therefore be subject to the same changes and volatility outlined above. That said, files hosted there have always been more freely, openly, and directly accessible even without login. While users need to login to engage with the GitHub platform (add content, comment, propose changes, …), all that is needed to access the files hosted there is the correctly formatted URL. Anyone can download either an individual file or the whole collection with one click and without login. There is, of course, no guarantee that this will continue to be the case, but it has been that way since the beginning of GitHub, and has remained so since the Microsoft buyout.

Some academic teams prefer to avoid GitHub in favour of alternatives like GitLab.¹⁶ According to the criteria discussed here, these alternatives are broadly similar in design, at least currently. Like GitHub, GitLab allows individual file download without login as long as the repository is public. Whether on GitHub, GitLab, or elsewhere, this direct access to files improves the true ‘openness’ of the data for researchers, but for a project focussed on impact for a wide range of users, GitHub’s clear code-centric focus makes it a solution for correspondingly code-centric parts of the workflow and code-centric users. It is not practical or inviting for non-computational musicians.¹⁷

2.3 Zenodo.org

Further serving research uses cases, and at the specific request of some researchers, we later added Zenodo as a third platform to this story. One clear benefit of this move is that all additional platforms increase visibility to some extent. (Additional platforms come at the cost of integration and maintenance, but GitHub-Zenodo integration is especially low-maintenance, as discussed in §2.4 below.) Zenodo is a well-known, and much-visited platform which further enhances visibility, though again Zenodo’s primary focus is on researchers, making for a strong overlap with GitHub and limited added value.

Zenodo also offers the demarcation of fixed, and semantically numbered versions. GitHub also offers this; where Zenodo separates itself is in the minting of a DOI for project, both overall and for each of those versions. This was the reason for the researchers’ request in our case: the wish to cite a specific, fixed version of a corpus liable to change over time. Finally, Zenodo benefits from the host organisation: CERN, an academic institution subject to a governance which is fundamentally different from commercial companies. Related, Zenodo commits to store data as long as CERN is operational,¹⁸ which is a commitment uncommon in the commercial sector. CERN took physical form in 1954. If the past longevity is a measure of future prospects, it may be more durable than any firm in the highly volatile world of ‘big tech’.¹⁹

Some research teams treat Zenodo more focally, largely or entirely bypassing GitHub and equivalents. This means having a DOI and whole corpus download, but not history (beyond those numbered versions), nor individual file download, unless handled in some other way. Perhaps most relevant to the present domain are datasets from the ‘AudioLabs’ (Erlangen, DE) with datasets including Weiß et al. (2023); Zeitler et al. (2024) and most relevantly Weiß et al. (2021). AudioLabs’ data is often spread across platforms or ‘access points’.²⁰

In short, the traversal of multiple platforms is a common experience for teams dedicated to getting their data ‘out there’, AudioLabs very much included. In cases like ours where the data and full history is present on GitHub, little is gained from this additional platform, beyond slightly increased likelihood of discovery, reducing single point of failure (for the extreme scenario of GitHub failing), and ease of direct citation to a specific state for reproducibility.

2.4 Platform Comparison

As will be clear from the above, this story of journeying across platforms is relatively confusing, and traverses not only many platforms, but many criteria for meaningful openness. A summary is in order. At this stage in the story, users can download OpenScore files from one of three platforms, specifically from:

MuseScore.com, one score at a time, only if they have an account, and subject to the terms of those accounts. Many variables (including the price of account and the ease of discoverability) are subject to change.
GitHub, individually or entirely, but only if they know to look there and can navigate the public-facing site or command-line interface methods, both of which are clearly designed for computational users. This is a clear case of open in principle, but not in practice, at least not to non-computational users.
Zenodo, for the whole corpus (not individual score files), again primarily serving computational research use cases.

Two tables serve to summarise information about these platforms along with some other, select platforms which are included because they are either popular or closely related to this task. Table 1 summarises general usability features discussed above as they pertain to platforms that are central to this story (above), and others that are not used or only more peripherally relevant: The International Music Score Library Project (IMSLP), and Wikidata which are discussed elsewhere in this piece,²¹ and Figshare and OSF which are not.²² The columns focus on those considerations most relevant to the present story and as discussed in the main text. We ignore other standard points of comparison such as storage given that all offer relatively generous storage limits, beyond the needs of most OHDers’ projects like the one discussed here. The details of the limits vary by number of files, largest file size, total project size, and whether the project is public or private. They are relatively comparable and the specific numbers vary too frequently to warrant inclusion in an academic paper.

Table 1

Comparison of platforms used (upper part) and related (lower part, including Wikidata).

PLATFORM	DOWNLOADS		PREVIEWS	DOI	OTHER
PLATFORM	ALL	PER-FILE	PREVIEWS	DOI	OTHER
MuseScore.com	.	With login	Y	.	Commerical, Subject to change.
GitHub	Y	Y	Raw Code	.	Free public repos; Code-centric versioning.
Zenodo	Y	.	.	Y	Free, open-source; Project-level versioning.
fourscoreandmore (bespoke)	Y (Git)	Y (Git)	Y	.	Maintenance burdens inc. search optimisation.
IMSLP	N/A	Y (subscribe/wait period)	Thumbnail images	.	Predominantly images.
OSF	Y	Y	Some	Optional	Free, open-source; Project-level versioning.
Figshare	Y	Y	Some	Per item	Free for academics; Item/collection versioning.
Wikidata	N/A	N/A	N/A	.	Knowledge base; No file hosting.

Table 2, in turn, concerns the specific question of automated integration. We re-emphasise the important of this: almost all OHDers’ projects have limited resources and so the maintenance burden is a key consideration. Automation is essential for doing any at-scale work across platforms. In our case, we have organised an automated script for MuseScore-to-GitHub synchronisation, which is easy for us, but not readily available or repeatable for other projects. We cannot make that code public and eliminate MuseScore from the integration table. GitHub to Zenodo, by contrast, is extremely easy and well supported by both platforms (no specialist insight or techniques are required) and highly relevant to other projects. With a small degree of set-up (permission, credentials), GitHub-to-Zenodo syncing can be automated to take place with every GitHub release. As the table shows, this ease of integration is uncommon in other platform-pairs.

Table 2

Ease of integration between platforms.

FROM / TO	OSF	GITHUB	ZENODO	FIGSHARE	WIKIDATA
OSF	–	None	Auto	Manual	API
GitHub	Auto	–	Auto	Manual	API
Zenodo	Manual	Manual	–	API	API
Figshare	Manual	API	API	–	API
Wikidata	API	API	API	API	–

3 D.I.Y. Platform: Four Score and More

What is missing from the MuseScore-GitHub-Zenodo trifecta is a platform (or equivalent) that is designed in the spirit of open data for all our intended use cases, including musicians. Such an offering would meet the following criteria:

Essential:
1. The scores are available for free, direct download in several formats;
2. The web design is public-facing (clear and free from off-putting, code-related jargon);
3. The web site/s is/are easily searched and navigated, without specialist knowledge, allowing users to find works by composer, work name, and more.
Desirable:
1. Online preview of online scores, preferably playable, further supporting casual browsing;
2. Coverage of many datasets, beyond merely ‘ours’, and with clear instructions on how to use different formats;
3. Organisation around recognised linked open data standards such as Wikidata.
Longer-term goals:
1. Multiple editions represented and identified;
2. Manual or automatic edition comparison;
3. Easy interface for users to suggest changes.

What is clear from the above list is that no such platform exists in the commercial world currently, and nor is there a strong likelihood of one emerging. It is far from obvious how any company or other organisation would operate this in a manner that is both commercially ambitious and true to the maximally open licence. The nearest parallels to be seen for the data under discussion here are IMSLP (for score PDFs), CPDL, and Verovio-based renderers, all of which are discussed in the following section. Seemingly the only solution for our use case was to create such a site (or series of sites) ourselves, from scratch, and somehow justify this effort against the precarious criteria outlined above.

Again, maintenance of such a resource is subject to the vagaries and moving priorities of both academia and its rotating cast of members. On the positive side, the ‘fourscoreandmore’ website was created in 2018 to support the parallel inception of the OpenScore lieder corpus along with some other, related projects including one that uses the lieder data for pedagogical apps.²³ The site has been running continuously since, meaning that it has shown a greater longevity than many such efforts. Moreover, the website is as entirely under our control as is practical,²⁴ and the structured metadata around these corpora allows for the programmatic creation of sites to any design.

Apart from the frank discussion of our experiences with OpenScore, and the relative merits of different platforms, the main contribution reported here in this paper is a re-organisation of fourscoreandmore.org to centre on the provision described above. All scores are presented in simple, user-friendly fashion to all users, with options to search, browse, preview, and directly download any score for free and without login or other barrier. Figure 1 shows an example screenshot for the landing page of one such score. This provision achieves all (3/3) of the ‘essential’ functionality described above, and 1/3 of the desirable criteria. The ‘missing’ criteria (2/3 desirable, 3/3 long-term) concern engagement with other corpora and wider standards; these are the subjects of what follows.

A screenshot from an example of the fourscoreandmore pages for OpenScore encodings, here showing Clara Schumann’s *Liebst du um Schönheit*.

4 Future Prospects, Including Prospective Integration

The new fourscoreandmore provision focuses initially on functionality for elevating the openness of OpenScore and related corpora for which I/we are responsible, bringing them broadly in line with complementary provisions elsewhere. At the same time, we have built it flexibly with a view to supporting possible integration with a wider range of corpora. Examples of other, related online offerings see a relatively clear division between academic and non- (or not primarily) academic initiatives.

4.1 IMSLP and CPDL

Non-academic initiatives include the IMSLP which is the standout success story of music score data sharing on a scale far beyond others discussed here. IMSLP has weathered legal troubles (including a brief take down in 2008) and seems to have been in rude health ever since. It is, however, very strongly focussed on score images (PDFs) with encoded scores in a very subsidiary role. OpenScore engages directly with IMSLP by taking a specific IMSLP-ID as the source of each transcription. We benefit from their vetting of public domain status (we only consider editions they have approved in this way) and provide a complementary service in the form of transcribed data. We see a need for the complementary provision described here (centred on digital scores), and IMSLP does not seem to have any plans in that direction.

IMSLP’s focus on PDFs is also an interesting statement with respect to digital longevity. ChoralWiki is the home of the ‘Choral Public Domain Library’.²⁵ Like IMSLP, this is a user-upload, wiki-style site. Unlike IMSLP, there is an explicit focus on choral music and the format question is more balanced: there are transcriptions in a range of formats, and PDFs are typically made from those transcriptions. ChoralWiki introduces the question of vetting and quality control. IMSLP primarily hosts PDFs that are now in the public domain but were originally produced by professional, named publishers for sale (though there are some user-editions too). The quality still varies of course,²⁶ though clearly the emphasis on professionally produced editions tends to guarantee a decent level of quality overall.

CPDL hosts user-created transcriptions that have not been vetted. They may all be excellent, but there is simply no mechanism for verifying, and so no way to know without manual inspection. CPDL’s de-emphasis of PDF in favour of encoded scores (relative to IMSLP at least) does not mean prioritising any specific non-PDF format: CPDL files may be in any number of formats, including proprietary ones. Clearly, proprietary formats reduce usability: users wishing to use those scores must purchase the software in question, these days usually on a continuous, subscription model. Proprietary formats are especially liable to become deprecated, as seen in the recent (2024) case of ‘Finale’.²⁷ Many files on CPDL are in the Finale format, and while recovery of those files is not impossible, the barrier is considerably increased. The Finale scenario could be seen as a validation of IMSLP’s focus on PDFs, or as a motivation for more community-driven work around open source standards.

4.2 ‘Verovio’ and ‘mei-friend’

Open source standards leads us towards the academic projects in this space: producing corpora of scores, supporting standards for that effort, and developing platform software for engaging with that resource. Between-corpora there is variability in the format/s used and quality of encoding. Within a corpus, there tends to be relative consistency of format and quality. Open standards are almost always used, and all choices (format, workflow and more) are usually made clear in a paired academic paper.

Academic projects also yield yet further platforms for us to consider, notably for the online display of digital scores in open source formats. Apart from specific projects discussed above (from Algomus, Erlangen, and Tours) notable between-project initiatives include Verovio.²⁸ Verovio supports the online display and editing of the MEI format,²⁹ and support for further formats via conversion. For example, the ‘Verovio Humdrum Viewer’ is a notable extension, building on Verovio to enable viewing and editing of the light-weight ‘humdrum’ (aka ‘krn’) format.³⁰ Online MEI editing is further served by the ‘last mile editor’ mei-friend.³¹

These resources are geared towards the creation, adaptation and versioning of scores in supported formats. For that they are are far superior to our OpenScore workflows: while we have some provision for versioning (discussed above), there is no collaborative platform (commercial or otherwise) for editing MuseScore files in this way. There are two issues, however, with the emphasis on:

code-centric editing. The raw format is displayed alongside the rendering, allowing the parallel editing of either. This is excellent for users who are comfortable with the code as well as the notation. For that user, some edits are easiest to make on the rendered score, some lend themselves to working on the code, and it’s certainly useful to have both options available, linked, and side-by-side. By the same token, this is off-putting for the majority of musicians: even those that are comfortable encoding scores into digital formats typically use GUI music notation software (MuseScore, Finale, …) and simply never see the raw encoding. The code can theoretically be ignored, but this is a barrier in the same way as discussed above concerning the status of GitHub as a platform for file sharing.
online-only. Both tools are clearly geared towards online use cases. It is certainly possible to create and adapt humdrum and MEI files off-line, but there is no off-line GUI desktop app for this purpose.³² The creation of a desktop app for MEI might significantly shift the usership.

One conclusion to draw from the above might be that these provisions are complementary to the ‘MuseScore Studio’ desktop app, particularly if the very promising MEI ←→ MuseScore conversion tools continue to develop (integrated into MuseScore since v4.2, Hankinson et al. (2024)). There is always friction when changing between encoding standards (even between successive versions of MuseScore); the practical utility depends on how much friction and of what kind.

That complementarity could motivate integration of humdrum and MEI files in fourscoreandmore, or a similar site. Third-party sites can embed MEI scores using the Verovio toolkit; in that ‘read-only’ scenario, the raw-code side can be hidden (which affects usability as discussed above). These viewers also support files in the relevant formats that are publicly-hosted on external sites. For example, any corpus with score files hosted on GitHub, GitLab, or similar can engage the viewer by extending the URL in a fixed and regular way to render that file. This is another significant motivation for the direct file download option as discussed above.

As such, a site designed like fourscoreandmore could present these additional corpora, using the viewer relevant to each with html embedding similar to that currently used for the MuseScore.com files. Alternatively, it could see itself as a complement to the humdrum-specific ‘Verovio Humdrum Viewer’ which includes a dropdown ‘Scores’ menu for many corpora in that format.³³ That is, perhaps humdrum is well-served, and it falls to a site like fourscoreandmore to serve corpora in other formats? The cost-benefit calculation for extending the provision at fourscoreandmore is a function of at least:

How visible the respective ‘platforms’ are already and how much additional visibility would integration (re-directs) at fourscoreandmore provide.
The difficulty of automatic website-building in these cases. Considerations here include the stability of the collection and the existing provision of structured metadata.
The overall benefits of consolidation. Users seeking score PDFs know to start at IMSLP to give themselves the best chance of finding what they seek; users seeking digitally encoded scores arguably need something comparable.

4.3 Knowledge Graphs and Authority Files at Wikidata and Elsewhere

Practically, when it comes to the topic of data sharing and coordination outlined above, much will depend on the sharing of structured metadata. Clearly it would be best practice to have recognised standards for every aspect of the musical data engaged everywhere. At minimum, this should cover the creators (composers and lyricists) and works (and relations between them, e.g., with songs grouped in collections). Better still would be coverage of additional elements such as editions (discussed above), ambiguous and/or mis-attribution (of creators and editors), secondary data (e.g., analyses), and more. Unfortunately, this data does not currently exist at the level of detail required to cover each work that features in our corpora. Moreover, that coverage is especially patchy when it comes to the historically marginalised composers. For instance, IMSLP is probably the most complete, widely used source of metadata for classical, symbolic music and there is an IMSLP source for each work in each OpenScore corpus (as described above). Yet, even IMSLP provides structured ID only as far as editions (i.e., a specific version of a whole score), but not an individual movement. There are also other interoperability challenges with IMSLP’s limited API.

In so far as this metadata exists anywhere, Wikidata is not currently the leading proponent. And while sites like Musicbrainz have done a very impressive job of mapping the terrain of this data, the primary focus on recorded music makes for an imperfect match here.³⁴ DDEX likewise serves to encode individuals’ roles in musical items for the music business purposes of claiming and distributing rights.³⁵ Wikidata is promising, and growing, but currently in the foothills of this mountainous effort relative to those other providers’ efforts. As mentioned above, when we began the OpenScore lieder corpus we found it practical to engage Wikidata up to the level of the composer: some were present already, and we undertook to add a few that weren’t. Doing this at the scale of individual songs was far outside what is practical for the project. The Wikidata coverage has since improved, but overall, that practical position remains. There is considerable work to be done in enhancing the data provision at Wikidata for its own sake, either from scratch, or (more likely) by triangulating between existing sources. A dedicated project could aim to programmatically enhance Wikidata’s coverage from the provision at IMSLP, Musicbrainz and elsewhere. Clearly that is another project, substantial in itself, and peripheral enough to the primary challenges here to be out of scope. Moreover, while ‘triangulating’ implies three sources, there are many more, and even a preliminary look at these sources shows that they are not mutually consistent. Promising and creative projects for navigating this space include accessing the many existing structured catalogues via natural language queries (‘chat’) and LLMs, as demonstrated by the LinkedMusic project (Fujinaga, PI).³⁶ This could be useful for coordination of that data.

5 Summary, Conclusion, and Outlook

This paper compares various platforms for hosting open humanities data both in general terms and as relevant to the specific case of OpenScore corpora of musical scores. These are complex questions, balancing many factors. To summarise, we have ended up with an ecosystem spanning the original (special interest, commercial) platform, further platforms supporting open data for research (GitHub, Zenodo), and created a bespoke suite of websites to make sense of it all for musicians (fourscoreandmore).

This is a significant undertaking for which no contributor is significantly incentivised by financial or other support. That makes it precarious, but by the same token, the ongoing survival of the project is a testament to the conviction of those that keep it live, despite those challenges. Moreover, while serving multiple use cases certainly adds complexity (not least in the sheer number of platforms), that complexity and exposure may help with the question of longevity. Responding to more demanding requirements in the immediate term may help shore up some degree of future proofing, not least by obviating the single-point-of-failure that has bedevilled too many OHDers’ projects.

It is worth reinforcing the point that ‘bespoke’ end points do not imply a ‘possessive’ attitude to the data. OpenScore data is publicly exposed on several platforms and our use of the CC0 licence has been axiomatic from the start: we welcome citation and other acknowledgement, but do not require even that. The fourscoreandmore provision focuses on the OpenScore datasets that we have created, bringing the provision in line with others like the Humdrum-data created by many academic projects and coordinated by Craig Sapp. Future work probably involves integration and/or cross-reference the open platforms to encompass providers, formats and more. It could be that fourscoreandmore is the site to provide this, but equally it could be elsewhere. What’s important is that the data exists somewhere, and that the provision is maintained and extensible, with clear and transparent criteria for inclusion. In any case, we aim (as ever) to serve data in a way that is FAIR not only in principle, but in practice, and for as wide a range of users as possible. We seem to be settling on a bespoke solution, while trying to make the maintenance manageable for a project that has always ‘run on fumes’.

Notes

[1] http://fourscoreandmore.org/, accessed January 2026.

[2] Near variants include ‘laid his hat’ (e.g., Temptations’ Papa was a rolling stone) and ‘Lay My Head’ (e.g., Metallica’s Wherever I May Roam and Tom Waits’ Anywhere I Lay My Head) though with more false positives in this latter category.

[3] E.g., 50 Cent appears to be referencing several precedents in What Up Gangsta.

[4] ‘Das Wandern ist des Müllers Lust’ (‘To wander is the miller’s delight’).

[5] See further discussion in M. R. H. Gotham (2021).

[6] App: https://musescore.org/; Platform: https://musescore.com. Accessed Jan. 2026.

[7] https://www.mu.se, accessed Jan. 2026.

[8] https://musescore.com/openscore, accessed Jan. 2026. The original, external website organised and controlled by MuseScore centrally (openscore.cc), has recently gone dark: accessed successfully in September 2025 and not in October 2025.

[9] For previous reports on this data, see M. R. H. Gotham and Jonas (2022); M. R. H. Gotham et al. (2018).

[10] See discussion in M. Gotham, Micchi, et al. (2023); Nápoles López et al. (2021). Specific data challenges for alignment include multiple measurements of musical time in symbolic data, including repeats (M. Gotham, Hentschel, et al., 2023).

[11] While there are some strikingly early cases of corpus study such as Budge (1943), the data itself is not provided separately from the analysis thereof, preventing further use. While there are ongoing challenges with multiple formats and more, most data-driven studies these days feature data released publicly which is at least FAIR-in-principle.

[12] The quartets are as reported in M. R. H. Gotham et al. (2023). A paper on the orchestral works is forthcoming; for the data see Blessing et al. (2025).

[13] As cited above (Blessing et al., 2025; M. Gotham, Micchi, et al., 2023; M. R. H. Gotham, 2021; M. R. H. Gotham & Jonas, 2022; M. R. H. Gotham et al., 2018, 2023) and elsewhere (M. R. H. Gotham, 2023; M. R. H. Gotham, Gullings, et al., 2021; M. R. H. Gotham, Kleinertz, et al., 2021).

[14] https://github.com/OpenScore/, accessed Jan. 2026.

[15] Internal Microsoft research explores the extent of open data on the platform. There is a report on their website, and on GitHub under the official ‘GitHub’ user (https://www.microsoft.com/en-us/research/publication/open-data-on-github-unlocking-the-potential-of-ai/, https://github.com/github/open-data-on-github, accessed Jan. 2026).

[16] Groups based on GitLab with projects related to OpenScore including ‘Algomus’ (Université de Lille, FR, https://www.algomus.fr, accessed Jan. 2026). Looking ahead to the section below on bespoke platforms (§3), Algomus also run a platform called ‘Dezrann’ (Ballester et al. (2025), http://dezrann.net/, http://gitlab.com/algomus.fr/dezrann/, accessed Jan. 2026) which serves to combine, align and visualise multiple sources (scores, analyses, …). Among the corpora represented are parts of the OpenScore effort. While source files are stored in shared formats, Dezrann stores the combination in a bespoke, json-based ‘.dez’ files. Direct download of these ‘.dez’ files is not an intended use case for musicians.

[17] The ‘RicercarDataLab’ (Université de Tours, FR)) is a notable case here (https://ricercardatalab.cesr.univ-tours.fr/en/, accessed Jan. 2026). The team provides several excellent websites meeting many of the criteria described here, including direct file download. The team use GitLab for some of their work, but the repositiory/ies are private. Direct downloads are provided, as part of the assets of the public website given above, not the private GitLab.

[18] https://about.zenodo.org/policies/, accessed Jan. 2026.

[19] Some individual universities make a similar offer to host data with DOI as long as the institution lasts. Unfortunately, given the number of closures in recent years, this is not nearly as compelling.

[20] A notable example is their ‘Dagstuhl ChoirSet’ (DCS) which traverses an overview website (https://www.audiolabs-erlangen.de/resources/MIR/2020-DagstuhlChoirSet), a more interactive website with ‘Track Switcher’ (direct listening, https://www.audiolabs-erlangen.de/resources/MIR/2020-DagstuhlChoirSet/data/DCS_LI_FullChoir_Take01), Rosenzweig et al. (2021)’s Zenodo record (with 6 versions), a GitHub repo for ‘tooling’ (i.e., not the data but the ode, https://github.com/helenacuesta/DCStoolbox), and integration in ‘MIR data’ (discussed above). For examples of their other datasets, see https://www.audiolabs-erlangen.de/fau/professor/mueller/resources. All these URLs accessed Jan. 2026.

[21] http://imslp.org/, https://www.wikidata.org/, accessed Jan. 2026.

[22] https://figshare.com, https://osf.io, accessed Jan. 2026.

[23] https://fourscoreandmore.org/, https://fourscoreandmore.org/cut-outs/, accessed Jan. 2026.

[24] There is always risk of issues with changing terms for the infrastructure, such as web domain registration and/or the GitHub-supported workflow (yes, the GitHub platform features here too).

[25] https://www.cpdl.org/, accessed Jan. 2026.

[26] This is particularly true of weakly-edited sources as we discuss in M. R. H. Gotham et al. (2023).

[27] Another proprietary format, Dorico, has promised that it will ‘always’ load Finale files. See https://www.finalemusic.com/blog/end-of-finale-new-journey-dorico-letter-from-president/, accessed Jan. 2026.

[28] Pugin et al. (2014), https://www.verovio.org/, accessed Jan. 2026.

[29] Verovio’s home page provides a list of the numerous projects and applications making use of it. Note that this includes Tours and Erlangen provisions discussed above. The online Verovio editor is at https://editor.verovio.org/, accessed Jan. 2026.

[30] http://verovio.humdrum.org/, accessed Jan. 2026.

[31] Goebl & Weigl (2024), http://mei-friend.mdw.ac.at/. MEI is promoted and advanced by the Music encoding community who list Verovio and mei-friend alongside some additional resources here https://music-encoding.org/resources/tools.html. Both URLs accessed Jan. 2026.

[32] This is somewhat ironic as the mei-friend was first designed as a plug-in for the ‘atom’ IDE before that software was discontinued. We may be coming full circle, however, as there is now a minimal MEI viewer (note not editor) extension for rendering MEI inside Microsoft’s Visual Studio Code (https://github.com/sonovice/mei-viewer-vscode, accessed Jan. 2026).

[33] https://verovio.humdrum.org, accessed Jan. 2026.

[34] https://musicbrainz.org, accessed Jan. 2026.

[35] https://ddex.net, accessed Jan. 2026.

[36] L. Pond, S. Ngassam, L. Kirby, S. Meng, S. Chow, D. Hillerbrand, and I. Fujinaga, “SESEMMI for LinkedMusic: Democratizing Access to Musical Archives via Large Language Models”, in 1st Workshop on Large Language Models for Music & Audio (LLM4MA), Daejeon, South Korea, 2025. Forthcoming. https://ismir2025program.ismir.net/lbd_476.html, accessed Jan. 2026.

Acknowledgements

Please see the funding statement. Thanks also to JOHD editors and reviewers, and to many anonymous colleagues for the their comments on this article during development. Thanks most of all to the many individuals who have contributed to the open corpora discussed here.

Competing Interests

The author has no competing interests to declare.

Author Contributions

This paper has been conceptualised and written exclusively by the named author. Data creation tasks were distributed in complex ways as discussed in main text and code commit histories.