1. Introduction
The humanities and the social sciences have witnessed phases of theoretical and methodological re-orientations that Bachmann-Medick (2016) collectively labels as “cultural turns”, including but not limited to interpretive, performative, reflexive, postcolonial, translational, spatial and iconic re-orientations. In the field of human-computer interaction (HCI), such turns have been described as “waves”. A widely shared perspective on such re-orientations in HCI is that the field went through three consecutive waves (Duarte and Baranauskas, 2016; Bødker, 2015), with the third wave bringing forth discussions of culture and value. In the field of Music Information Research (MIR), scholarly critiques are emerging that emphasize the discipline’s need to further diversify, to embrace “new interdisciplinary futures” (Born, 2020, p.193), and to actively engage in new “turns” (Morreale, 2021, p.106). In this article, we take up the thread started by Born (2020), who sees the need for epistemological and ontological re-orientations in MIR, and build on two of our recent works: our ISMIR 2021 paper (Huang et al., 2021) — which was a Best Paper Candidate, winner of the Best Special Call Paper Award, and of which this article is an extension — that draws a connection between ethics for music AI and non-Western philosophies; and a book chapter (Huang et al. 2022) that furthers our exploration of a “culturally informed ethics of music AI” by surveying actual practice by various stakeholders across Asia.
Based on our earlier call for practicing “ethical pluralism” in music informatics (Huang et al., 2021), we argue in this article that MIR as a field has yet to complete its “cultural turn”, and for the MIR community to diversify itself in ways that transcend mere tokenistic gestures — it needs to go beyond diversifying datasets and scrutinize its epistemological, ontological, methodological, and axiological assumptions. In other words, besides including more musical traditions as its subject of study, MIR needs to simultaneously expand its modes of conceptualizing (thinking about) and engaging (working with) music. This would require an auto-critique of the field’s raison d’être, that is, the philosophical underpinnings of what MIR is and should become. Besides the much-needed ethical and cultural turns, some fundamental questions to work through are: What is music (ontology)? What is the nature and limits of knowledge concerning music (epistemology)? How do we obtain such knowledge (methodology)? And what about music and our own research endeavor do we consider “good” and “valuable” (axiology)?
This is aligned with what Born (2020) advocates as a new, “agonistic mode of interdisciplinarity”, which puts MIR in a non-hierarchical dialogue with contemporary (ethno)musicology, music anthropology and sociology, enabling all contributing disciplines to grow through “mutual transformations” and generate “entirely unforeseen, novel methodologies and theories” (Born, 2020, p.200). We share Born’s observation that to truly diversify MIR, it needs to move beyond its “subordination” mode of interdisciplinarity, where “a touch” of certain humanistic or social scientific discipline is added to an extent that would “serve” MIR but does not threaten its ontological and epistemological premises (p.200).
While Born (2020) calls for attention to be directed toward the epistemological, ontological, and methodological, we add to this mix axiology, or the philosophical study of value. We propose that integrating an axiological dimension into discussions of ethics will allow MIR to examine with greater nuance the values upon which it, as a research community, grounds its ethical principles. In contrast to efforts that seek the “universal” in music and in technology (Figal, 2015; Savage, 2022), we emphasize the need to access each musical space — be it a body of repertoire, a genre, a tradition, a collection of recordings, or an artistic practice — as part of a distinct world or “ecosystem” that comes with its own webs of relationships, rituals, and significance (Geertz, 1973, p.5). By music ecosystem, we are inspired by a body of work that came before (Titon, 1984; Tan, 2012; Clancy, 2021) but follow Schippers (2015, p.137) in defining the term as:
“the whole system, including not only a specific music genre, but also the complex of factors defining the genesis, development and sustainability of the surrounding music culture in the widest sense, including (but not limited to) the role of individuals, communities, values and attitudes, learning processes, contexts for making music, infrastructure and organisations, rights and regulations, diaspora and travel, media and the music industry.” (Schippers after Tansley, 1935, p.298)
In this sense, music generated using AI constitutes a new kind of music ecosystem. In light of the scholarship on postmodernity and the observation of Seligman et al. (2008) that humans live in a fundamentally fragmented reality, we suggest that a heightened awareness of the disjunctions that exist between different music worlds will contribute toward building an ethically responsible MIR community. We demonstrate these concepts via two case studies, each situated in its own musical ecosystem.
In this paper we extend our ISMIR 2021 paper (Huang et al., 2021) in several ways. Section 2 contributes an overview of MIR’s engagement with ethics, after which we address the importance of considering value (axiology) when analyzing and practicing technology ethics. This is followed by a survey on the degree of ethnomusicological presence in the recent history of the ISMIR conference and a discussion on the state of MIR’s “cultural turn”. The heart of this article features two case studies. The case study in Section 3 revisits Huang et al. (2021): we explore possible musical “cosmotechnics” within Asia (Hui, 2021) — across time and space — and reflect on indigenous philosophical traditions from this region as they apply to the ethics of music informatics and AI. This destabilizes common epistemological, ontological, methodological, and axiological orientations when interpreting ethical principles for autonomous and intelligent systems (A/IS). Section 4 presents a new case study to further illustrate our call: we re-situate ourselves in the world of Irish traditional music (ITM) and demonstrate “agonistic interdisciplinarity” as a way of challenging MIR’s existing assumptions. This examines more deeply what “responsible engineering” in the space of traditional music could signify.
2. Disciplinary Reflections
2.1 An Ethical Turn?
Given recent advancements and controversies involving AI technology, the ethical implications of integrating such technology into public, private and commercial spheres have become issues of compelling interest to individuals, companies, and governments (Jobin et al., 2019). This has led to the creation of research forums like the ACM Conference on Fairness, Accountability, and Transparency,1 crowd-sourced initiatives like the AI Incident Database,2 the formation of corporate ethics committees, inquiries and reports by government bodies,3 and focus groups of professional global organizations.
The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems4 consists of engineers from six continents, and has produced three editions of “Ethically Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems” (IEEE, 2019). The latest edition of this document (EAD1e) argues for the development of autonomous systems guided by eight ethical principles: human rights, well-being, data agency, effectiveness, transparency, accountability, awareness of misuse, and competence. EAD1e supports designing with values that “put human advancement at the core of development of technical systems”, in concert with the recognition that “machines should serve humans and not the other way around … to create autonomous and intelligent technical systems that enhance and extend human well-being and freedom” (IEEE, 2019, p.6). These principles align with several AI guidelines produced around the world (Jobin et al., 2019). The majority of them, however, originate from countries that are economically developed, which incurs the risk of neglecting meaningful local knowledge and value systems from other regions, thus jeopardizing global fairness (Jecker and Nakazawa, 2022). This is acknowledged in EAD1e that emphasizes the need for cross-cultural dialogue so as to encompass a wider range of value systems (IEEE, 2019, pp.49–58).
Ethical frameworks further gain importance in contexts in which the application of AI is not considered to be high-risk, as it will in these cases not be considered by regulatory legal frameworks. As Clancy (2021) argues, since copyright as it stands is not capable of matching AI, an equitable approach to the financial implications of AI can only occur through an ethical response from stakeholders of the music ecosystem. But who would be in the best position to drive such a response? Following the argument by Gold et al. (2022), scholarly conferences can play a central role in defining ethical values of their community’s collective identity and aspiration. As such, the ISMIR community would be in an ideal position to extend its research activities to the pursuit of equitable models for music ecosystems (of different kinds and sizes) in the era of artificial intelligence. However, as analyzed by Holzapfel et al. (2018) and Morreale (2021), the ISMIR community has throughout its history largely ignored ethical implications of MIR technology. In order to perform the needed “ethical turn”, as Morreale further argues, the ISMIR community would need to first conduct an epistemological turn. Morreale’s proposal resonates with the earlier call by Born (2020) to diversify MIR: from expanding the field’s theories of “what music is” (Born (2010, p.232) as cited in Born (2020, p.199) to broadening its “ways of knowing” (Bowker (2018, p.207) as cited in Born (2020, p.200)).
2.2 Ethics, Axiology, and Technology
In this article, we propose that axiology, the philosophical study of value, be taken into account when practicing MIR: what are the core values of our research; what ought to be its impact on its stakeholders? The word “axiology” was introduced into philosophy by Urban (1909). While topics that are now the focus of axiology — including “meaning, characteristics, and classification of value, the nature of evaluation, and the character of value judgments” — have traditionally been attached to the study of ethics, they are increasingly accepted as a special branch of contemporary philosophy (Bunnin and Yu, 2004, p.65). The entwined relation between axiology and ethics is examined by Chang (2001), who defines axiology as a “philosophy of human life” whose purpose is to provide “fundamental codes of action” by inquiring into the “general nature, features and types of value” (p.71). Axiology thus extends beyond the limit of moral values that is the focus of traditional ethics. Or, more colloquially phrased in the context of music, besides ethical considerations our research is guided by, for instance, our musical taste and preferences.
Values, including but not limited to moral and aesthetic values, play a crucial role in the configuration of technology. This is explored by Gonzalez (2015, pp.3–28), who, on the basis that technology is value-laden, advocates the philosophical study of values about technology (or “axiology of technology”). Kroes and Meijers (2016) take a step further in calling for a two-stage “axiological turn” in the philosophy of technology as a continuation of the field’s earlier empirical turn (p.12). Drawing from Verbeek (2010), Kroes and Meijers (2016) argue that philosophers of technology need to participate directly in the development of technology via doing technology ethics, while at the same time moving beyond the scope of ethical values to include other kinds of values (pp.25–26).
Not unlike ethics, axiology has a cultural dimension, too. Examining Confucianism5 in nineteenth-century East Asia, Chen (2018) asks what “axiologies” predominated in this historically and culturally specific context. Via the Confucian idea of “benevolence and love” (renai 仁愛), Chen illustrates how this concept is not only a political one, but also carries with it “ethical and axiological import” (p.103). Scholars have also characterized the “New Confucian Movement”, a modern transformation of Confucian humanism, as a philosophical discourse that heavily “depends on axiological themes and traits” (Berthrong, 2008, p.423). The two case studies presented in this article demonstrate this dynamic interplay between value, ethics, and culture: in the former case, a number of philosophical traditions practiced in Asia and their associated values are considered for a re-interpretation of existing ethical principles for A/IS; in the latter, the value systems that help define ITM are taken seriously in order to conduct an auto-critique over AI’s intervention in the tradition.
2.3 A Cultural Turn?
An increase of attention in MIR toward aspects of the cultural contexts in which music takes place would be a strong indicator of a cultural turn. One way to collect evidence for such a cultural turn is to investigate how research presented at annual ISMIR conferences has engaged with the field of ethnomusicology, a discipline that studies music in (or as) culture (definition based on Rice, 2014, pp.1–10) and that is undergoing its own process of identity solidification (Amico, 2020). It should be noted, however, that while our following analysis focuses on the ISMIR conference, there have been initiatives beyond ISMIR that are relevant to this discussion. An example is the ERC-funded CompMusic project (2011–2017),6 which provides MIR datasets of and tools for Hindustani, Carnatic, Turkish-makam, Arab-Andalusian, and Beijing Opera traditions. In 2017, a workshop titled “Computational Ethnomusicology: Methodologies for a New Field” was held at the Lorentz Center, marking another step forward in reducing the semantic gap between MIR and ethnomusicology.7 On the other hand, ethnomusicology is only one of the many disciplines that MIR would increasingly engage with in a cultural turn, with other disciplines being — to name only a few — science and technology studies, anthropology, critical theory and cultural studies, philosophy of technology, music theory, or sociology. Hence, the following investigation presents evidence from a limited perspective focused on the intersections between MIR and ethnomusicology, without intending to assign an exclusive role to ethnomusicology in a cultural turn of MIR.
In the context of ISMIR, the relation between MIR and ethnomusicology seems to have been subject to discussion within the conference’s organizing committees. In 2011, the topic of “applications to non-Western music” was listed, to be replaced in the years 2012–2016 by the topic “computational ethnomusicology” (CE). In 2017, this topic disappeared from the list, and the calls since 2017 have listed computational musicology and computational music theory as topics related to the musicologies. In 2021, CE re-entered the list of ISMIR topics, and in 2022 seven ISMIR papers belonged to this category, an increased presence that may have been encouraged by the special calls for papers on cultural and social diversity in MIR in the years 2021 and 2022.
To obtain a better understanding of how discourses in MIR engage with ethnomusicology, the published body of papers of 16 years of the ISMIR conference was analyzed. After a short period of intense growth, the number of accepted papers at ISMIR between 2004 and 2018 remained largely constant, with about 110 papers in the mean (std: 11). Using the ISMIR paper explorer (Low et al., 2019), we searched all ISMIR papers in this “stable” period (2004–2018) for occurrences of the term “ethnomusicology”, and manually searched all papers from the 2019 conference (which is not contained in the explorer). We obtained a total of 78 papers that mention the term. Figure 1 depicts the percentage of the papers published in the proceedings that mention the term in each year. Despite fluctuations from year to year that would be speculative to interpret, there is a trend of increasing presence of the term “ethnomusicology” in ISMIR papers until 2009. After that, about 5% of published ISMIR papers each year mention the term, which illustrates a continuing awareness of the field in MIR that is consistent with numbers reported about a decade ago (Cornelis et al., 2010, p.1011).

Figure 1
Percentage of papers that mention the term “ethnomusicology” in the ISMIR proceedings from 2004 to 2019.
Whereas a detailed analysis of how the 78 papers relate to the discipline of ethnomusicology is beyond the scope of this article, some qualitative insights may shed light on the nature of references to the term as it occurs in the papers. To this end, we selected a subset of papers from two periods: the first period contains the years 2008 and 2009, which marks the beginning of a continuous phase of heightened interest in ethnomusicology as depicted in Figure 1; the second period contains the years 2018 and 2019, in order to identify potential shifts in discourse over the range of 10 years. In the first period, 14 papers mentioned the term, whereas the second period contains 11 papers. A thematic analysis (Braun and Clarke, 2006) was conducted with these 25 papers by Holzapfel to investigate in which context the term “ethnomusicology” is used. In this analysis, all sentences including the term were extracted and assigned to non-predefined codes, which were in turn grouped into themes related to the type of interaction between the fields.
The four main themes that emerge were (1) computational ethnomusicology, (2) MIR informing ethnomusicology, (3) ethnomusicology informing MIR, and (4) synergies and collaborations. The first theme, computational ethnomusicology, is represented by occurrences in 4 papers (exclusively in the first period), and in these the term CE is used to situate the papers within this new area of research, which refers to the “design, development and usage of computer tools that have the potential to assist in ethnomusicological research” (Tzanetakis et al., 2007). The second thematic group is related to the first in this motivation, but does not employ the term CE. A total of 9 papers out of the total of 25 involve statements that emphasize the relevance of presented results for research in ethnomusicology. The third and largest thematic group is ethnomusicology informing MIR, with 17 papers using the term in such a context. Within this theme, three sub-themes are identified. The first (12 papers) uses references to ethnomusicological research to contextualize the research presented in the paper. The second sub-theme contains references to collections of recordings and field recording practice (6 papers). It should be pointed out that the emphasis here lies on field recording, the material outcome of field work, rather than the overall practice of ethnography. The third sub-theme finds expression in only 1 of the examined 25 papers, by employing ethnography to obtain insights into the listening behavior of a specific group of users (Cunningham and Nichols, 2009). The final theme, synergies and collaborations, finds expression in only 3 out of the 25 papers. In these, either the intention to collaborate with ethnomusicologists is mentioned (Wright et al., 2008), or ethnomusicologists are included as a user group in the study of the paper (Holzapfel and Benetos, 2019).
This analysis reveals a relationship between MIR and ethnomusicology that is far from the “agonistic interdisciplinarity” Born (2020) proposes. While MIR has incorporated outcomes of ethnomusicological studies when they could support or “serve” its discourse,8 synergies and collaborations are rare. An exception is the paper by Finkensiep et al. (2019), which includes an ethnomusicologist as co-author, and combines data-analytic approaches with ethnomusicological expertise to gain insights into the working of a specific musical idiom. More often, MIR scholars have focused on the “power” of computational methods to “contribute to” (ethno)musicology and to “upgrade” the humanistic discipline from a “data-poor field” into a “data-rich field” capable of providing “greater power for hypothesis testing” (Volk et al. (2011), citing Huron (1999)). Implied here is a hierarchical relationship between the physical and natural sciences and the qualitative humanities and social sciences, a hard-to-avoid problem that Cook (2005) addressed in his invited talk at ISMIR. It is also telling that in Gómez et al. (2013)’s introduction of a special issue of the Journal of New Music Research devoted to CE, the authors acknowledge that despite the issue’s coverage of “a broad area” of topics, “ethnomusicology core-problems” are “barely represented” (Gómez et al., 2013, p.111).
We thus argue that while MIR has made several significant attempts to diversify itself, to arrive at a more profound “cultural turn”, the field needs to open itself to non-hierarchical partnerships with humanistic and social scientific music scholarship, including but not limited to contemporary (ethno)musicology and music theory. Such dialogues could lead all involved disciplines toward critically rethinking the dialectics between scientific objectivity and humanistic subjectivity, between universality and particularity, between coherence and fragmentality, and between the modern and “the postmodern condition” in the study of (musical) culture (Lyotard, 1984). In light of the increased popularization of “digital humanities” (DH), it seems timely to ask: is the growth of CE and the reinvigoration of a “new comparative musicology” signaling a “groundbreaking, forward-looking, wonderfully experimental” moment as suggested by Tilley (2018, p.971), or is it a worrisome trend — considering ethnomusicology’s disciplinary origin as a turn away from comparative musicology and reductive quantitative approaches — that may risk taking the cultural study of music “backwards into the future” (Born, 2020, p.198)?
3. Case Study 1: Responsible (Music) AI through Philosophical Re-orientations
In Section 2.2 we emphasize the importance of considering the axiological dimension — the philosophical study of value or the “philosophy of human life” as defined by Chang (2001) — when practicing MIR. We also illustrate that axiology, like ethics, has a cultural dimension and show how notions in non-Western philosophical discourses such as that of “benevolence and love” (renai 仁愛) may simultaneously carry axiological, ethical, and political import. In this first case study, we explore how existing ethical principles for A/IS may be re-interpreted when non-Western philosophical traditions (including their ethical and axiological dimensions) are taken into account. This is thus our effort to practice what we call for in the Introduction (Section 1): to challenge existing assumptions underlying each of the four ologies (ontology, epistemology, methodology, and axiology) when approaching the ethics and cultural politics of music AI. This case study is the direct outcome of our own attempt at the “agonistic interdisciplinarity” proposed by Born (2020), made possible by the non-hierarchical partnership across disciplines that is key to our ongoing collaboration.
The remainder of this section is adapted from our ISMIR 2021 paper (Huang et al., 2021), which draws from Asian and mainly East Asian philosophical traditions with a heavier focus on Confucian thought.9 We should start by acknowledging that the current discussion is far from exhaustive and that due to the scope of this article, we are not able to dive deeper into many of the philosophical schools mentioned (such as Buddhism, Daoism, and Shintoism) or to address more intellectual traditions across Asia (including Indian and other indigenous philosophies from the region). Most of all, we fully recognize that each philosophical “school” — be it Daoism or Confucianism — is highly heterogeneous within and is constitutive of a unique “ecosystem” (Schippers, 2015, pp.136–137) and “universe” (Seligman et al., 2008, p.7) with its own moral and cosmic order. We are only scratching the surface of a much more complex topic.
3.1 Non-Western Approaches to Technology
Discussions of technology have historically been driven by Western thought rooted in Plato and Aristotle. In an attempt to challenge the assumption of a universality of Western science and technology, Dusek (2006) in his overview of the philosophy of technology devotes two chapters to those who have historically been excluded from mainstream accounts of the nature of technology. This includes a discussion on Feminist philosophy of technology (Dusek, 2006, pp.136–155) and on contributions of non-Western knowledge systems to the ongoing scientific and technological development (Dusek, 2006, pp.156–175). While we are aware of the significance of these gendered, feminist, and intersectional interventions also exemplified in the works of Noble (2018) and D’Ignazio and Klein (2020), our focus for this article lands on the latter, that is, the value of considering non-Western philosophical perspectives when assessing responsible (music) AI.
In the context of AI ethics, a minority of authors have highlighted the importance of diversifying perspectives (Jobin et al., 2019; Hagerty and Rubinov, 2019; Fjeld et al., 2020; Clancy, 2021; Jecker and Nakazawa, 2022). Goffi (2021) argues against global AI governance that imposes an arbitrary hegemonic set of Western-based ethical norms upon the majority of Humanity, in the worst case as a guise to promote narrow commercial interests. The Montreal AI Ethics Institute, an organization working to democratize AI ethics literacy, has recently curated a directory of experts in non-Western AI ethics.10 Notably, Hui (2016), in exploring the possibilities of constructing a philosophy of technology that is “properly Chinese” (p.7), argues for the necessity of considering not one universal technology, but multiple “cosmotechnics” both philosophically and historically. The concept of “cosmotechnics”, which the author further develops in Hui (2021) and which informs this article, is defined as “the unification between the cosmic order and the moral order through technical activities” (Hui, 2016, pp.19–20).
There have been a number of studies that draw a direct connection between technology and specific Asian philosophical traditions. Scholars have, for one, turned to Confucianism. Writing on “ethical pluralism”, Ess (2006) juxtaposes contemporary Western ethics with Confucian thought to emphasize the importance of embracing both “pluralistic structures of connection” and “irreducible differences” between intellectual traditions (p.215). Kirk et al. (2020) explore how the Chinese government builds on select Confucian notions to inform the nation’s approach to technological governance, noting the “stickiness” of central Confucian values (such as hierarchy, family, and social order11) in several East Asian societies despite divergence of political ideology. Advocating a “multicultural turn” in technology ethics, Wong and Wang (2021) develop what they call “Confucian ethics of technology”.
Buddhism also provides new opportunities for ethical (what is right?) and axiological (what is good and valuable?) reflections regarding technology. Rambelli (2018) explores the presence of machines in the Japanese Buddhist tradition, e.g., robotic monks and priests. In today’s Japan, one can see the Buddhist robot priest “Mindar” delivering sermons inside the 400-year-old Kodaiji temple (Nair, 2019). Hongladarom (2020) makes a significant attempt to bridge the ancient Buddhist tradition with technology (AI) ethics, as the author argues for a Buddhism-inspired standard of ethical perfection, namely “machine enlightenment” (p.7).
When studying the technology-friendly nature of Japanese society, scholars often turn to Shintoism as a source of understanding the nation’s anthropomorphic view of technology. In Shinto beliefs, there is no categorical distinction between humans, animals, and inanimate objects, as the religion attributes spirits, or kami 神 to all forms of existence. Juxtaposing this “Shinto-infused techno-animism” with actor-network theory,12 Jensen and Blok (2013) posit that Shinto cosmic views offer a vantage point for interpreting the contributions of non-humans to “collective life”, and for studying the entanglements of politics, ecology, science, and cosmos in contemporary Japanese society (p.84).
While scholars have held that technologies are in essence antithetical to the concept of “self-so” (ziran 自然)13 promoted in Daoist thought14 (Allen, 2010), others (Needham, 1981) have uncovered the long history of Daoism engaging with technology. A study of Nelson (2014) reveals how Daoist ideas have influenced early twentieth-century German thinkers (Buber and Heidegger) and their views on technological rationality and modernity. This further motivates critical engagement with the ethics and axiology of technology that traverses cultural and philosophical boundaries.
3.2 Toward an Ethically Aligned Design for Music AI in Asia
We will now examine three ethical principles listed in EAD1e (IEEE, 2019), namely “human rights”, “well-being”, and “awareness of misuse”, and investigate how their meanings may shift (or not) when taken out of familiar, Western philosophical and cultural contexts, within and beyond music.
“Human Rights”, the first principle in EAD1e (IEEE, 2019), underlies most major guidelines for AI ethics. What the term signifies, however, can shift with contexts (An-Na’im, 2010). A fundamental question one ought to ask here is “what it means to be human”. The notion of personhood in Confucian thought, for instance, is characterized as inherently relational, developmental, and virtue-based (Alford, 2010; Wong, 2021), leaving space for the concept to be extended from human to non-human actors. According to AI ethicist Pak-Hang Wong, based on the “role-based” Confucian ethics, one can attribute personhood to non-human beings as long as they “play ethically relevant roles and duties as humans” (Cassauwers, 2019). Wong’s comment resonates with what we find from examining the Japanese Society for Artificial Intelligence (JSAI) Ethical Guidelines, which includes a clause stating that once AI abides by all policies described therein, it can then become a “member” or “quasi-member” of society (JSAI Ethics Committee, 2017, p.42). While we certainly cannot reduce this aspect of JSAI to a result of Confucian influences (despite the major role Confucianism plays in shaping Japan’s history), it is nevertheless an example of how fundamental concepts often assumed to be universal may take on new meanings when they travel.
With this greater flexibility in defining personhood come opportunities to think beyond the narrowly defined “human”, to move from humanism to posthumanism, and ultimately to expand this first ethical principle of “Human Rights” in EAD1e so that it cares for the rights of both human and non-human actors (including the environment). Writing on the political ecology of music, Devine (2019) writes that the carbon footprint of the music industry did not decrease in the age of streaming. Eco-musicological considerations — as we continue to operate under the framework of music “ecosystems” — thus become important with the advance of large, energy-consuming neural networks (Strubell et al., 2019) and, specifically, AI-generated music increasingly marked by “overproduction” and “uninspired excess” (Mersch, 2022, p.65). It is here that we may find value in certain perspectives of Laozi — the ancient Daoist thinker who promotes subtle “action” through “inaction” (wuwei 無為), who considers the “great note” (dayin 大音) as the one that “sounds faint” (xisheng 希聲), and who teaches that “only by relying on what is not there, do we have use of the room” (Van Norden and Ivanhoe, 2005). These notions can become sources of inspiration when considering “music of the environment” and the recovery of “positive silence” (Schafer, 2006), the reduction rather than (over-)production of sound, and the complementary forces between sounding (the Daoist yang 陽) and non-sounding (the Daoist yin 陰)15 in this era of algorithmic explosion.
Meanwhile, Mozi — after whom the philosophical school of Mohism is named — strongly condemns wasteful performances of music which, according to his utilitarian calculus, will interfere with the fair distribution of resources in society and with the maintenance of “good order” (zhi 治) (Van Norden and Ivanhoe, 2005). This Mohist stance is highly relevant when it comes to evaluating AI systems that can generate billions of tunes (see Section 4) and flood our already overloaded info- and sound-scape with algorithmically-generated musical “spam”. According to boomy.com, the users of Boomy — an AI-powered “instant” music generator — have generated around 13.32% of the world’s recorded music as of April 17, 2023.
While it is beyond the scope of this work to explicate the above Daoist and Mohist thoughts, we hope the point has been made that a quick mental exercise of philosophical re-orientation may open up new perspectives for thinking critically about responsible music AI. As attempts are being made by researchers using neural-network soundscapes to protect natural environments (Yirka, 2020; Learn, 2019), it is the responsibility of every actor within the AI music ecosystem to consider the impact of such tools over the health of our collective soundscape, and what human and posthuman rights may consist of in this changing context.
“Well-being”, the second principle in EAD1e (IEEE, 2019, p.4), asks that A/IS creators “adopt increased human well-being as a primary success criterion for development”. Different cultures and communities, once again, may have varying views over how technologies can best serve mankind and its well-being. For one, the tendency in Confucianism-based societies to blur the boundary between the self and the community, the individual and the collective, and the private and the public has a profound impact on how well-being is understood and how technologies relate to it. According to Hagerty and Rubinov (2019), AI-driven surveillance technologies do not generate as much controversy in Singapore and China because state surveillance is perceived as an “acceptable exchange for security and stability” (p.16). Hung (2021) also notes that due to a prioritization of societal “harmony” (hexie 和諧) over individual well-being, “collectively mediating technologies” implemented without full, individual consent are more readily accepted in Confucianism-based societies. This is reflected in the Beijing AI principles that explicitly raise “Harmony and Cooperation” as one of the central pillars of responsible AI governance (Beijing Academy of Artificial Intelligence, 2019).
Bringing in a Confucian perspective also enriches the ways in which one can conceptualize human-technology relations, and how such relations can contribute to human flourishing. Reflecting on a Confucian “ritual technicity”, Wang (2021) brings forth the ritual dimensions of artifacts that transcend their sheer practicality and examines how, in “performing” (rather than merely “using”) technologies, humans are able to moralize themselves with artifacts (p.21). Because of this intimate techno-human relationship implied in Confucian theories of self-cultivation, once embodied and ritualized, technologies become integral to human pursuits of growth, wellness, and harmony with the world.
While EAD1e does not explicitly address music AI, included is a subsection titled “Affective Computing” that discusses issues related to emotion-like control in both humans and AI systems. Considering the role of music as a culturally dependent regulator of emotions, one could propose that any AI music system be considered as a form of “affective computing” and follow the guidelines detailed in this subsection. To build an ethical application of AI to music that could foster well-being, it would be productive for researchers and developers to understand what makes a certain way of organizing sound culturally appropriate and aesthetically pleasing in a given music ecosystem. In Confucian ontology of music, for instance, not all “sounds” (sheng 聲) are considered melodious “tones” (yin 音), and not all organized “tones” (yin 音) carry the potential of contributing to human flourishing — a required condition for “music” (yue 樂) (Huang, 2019, pp.51–52).
Meanwhile, the tendency of many East Asian communities to align “cutting-edge technologies” (AI) with agendas of traditional culture preservation provides an insight on how one may ethically deploy music AI in these societies to maximize well-being. On the website of Aichi’s World Expo (cited in Šabanović, 2014, p.360), one can find claims indicating that “conservation should replace mass production and consumption”. Writing on Japan’s assembly of science, technology, and culture, Šabanović (2014) explores the ways in which Japan legitimizes its adoption of new technologies through strategic association with traditional practices and cultural continuity: it is recorded that robots are used to preserve aizu bandaisan, a Japanese folk dance, as there are no human inheritors to carry them out (p.350). Meanwhile, in Beijing, the director of conceptual theater show 2047 Apologue (andy-Robot, 2020) creatively fuses AI and robotics with Chinese folk music to shed light on larger themes such as culture preservation and environmental crises.
Technologies, in these cases, are perceived as culturally situated artifacts. Traditions, on the other hand, are continuously renegotiated and redefined to include emerging technological devices and practices. Moving forward, similar programs may be initiated with the design of AI systems that could work to revitalize traditional repertoire. When Huang spoke with Huo (2022), deputy director of the Divine Music Administration (DMA) of the Temple of Heaven in Beijing, where work is ongoing to recover court music of the Qing dynasty (1636–1912), Huo shared that DMA would much appreciate an AI artist-collaborator, as the institute’s current funding policy makes any new hire of human composers virtually impossible.
“Awareness of misuse”, the third principle in EAD1e that we will examine, emphasizes the need to minimize the risks of potential misuse of AI. Similar to how the concepts of “human rights” and “well-being” are often more fluid than one may assume, what constitutes misuse can similarly be context-specific. According to Hongladarom (2020), if Buddhist values and perspectives are taken into account, AI and its manufacturers must consider the interest of others before those of their own, carrying in mind the goal of relieving “all others” — as much as they possibly can — of suffering (pp.5–6). Thus, AI devices should strive to achieve both ethical and technical excellence, and any design that causes suffering would be considered a case of misuse. This Buddhist vision of AI ethics is in line with Floridi (2008)’s definition of information ethics (IE) as an ethics that seeks to free all being — defined here as “the existence and flourishing of all entities and their global environment” — of entropy, a state “more fundamental than suffering” (p.12). Besides Buddhist values, the Confucian notion of “harmony” (he 和) is also a productive concept when conceptualizing “misuse” and “harm”, as it implies the possibility of harmonizing diverging interests among a wide range of stakeholders without erasing their irreducible differences.
When developers make the effort to understand and respect indigenous value systems (the axiological dimension), potential misuses can be avoided. According to TikTok/ByteDance research scientist Lamtharn (Hanoi) Hantrakul (2021), when designing Tone Transfer — a web app that uses Google’s machine learning AI to realize timbre conversion between different sound sources — he was met with the challenge of having to balance the interests of developers, target users, and those of local music practitioners who might oppose having the unique timbre of their traditional musical instruments taken out of context. In an interview with Huang, Hantrakul (2021) described the team’s decision to not include guqin, the ancient Chinese zither, in Tone Transfer so as not to misrepresent the instrument in front of an audience with limited knowledge about its sound, history, and aesthetics. Hantrakul then contrasted Tone Transfer with his work on Sounds of India, an AI-powered app that transforms sounds into specific Indian music instruments. The developers of Sounds of India, Hantrakul shared, were more confident to engage classical Indian instruments in this case as they knew the app would be directed toward communities already familiar with their sounds and usage, thus reducing the risks of misrepresentation and cultural appropriation.
4. Case Study 2: Responsible Engineering with Traditional Music
While experimenting with machine sequence learning for modeling ITM, Sturm et al. (2016) became aware of the need to identify and reconsider the researchers’ intentions and values even in the early stages of the project. This came about from initial interactions with members of the online community (http://thesession.org) from which training material for the machines was first obtained. When early results from the research were posted to the discussion forum of that online resource, one user replied in the thread: “explain how this is going to contribute to ITM”.16 Another wrote in the same thread: “Can we not technologically tamper with everything that is good and pure in this world? A computer farting out generated tunes in some academic lab somewhere is the beginning of the end. The sooner this experiment is confined to an anonymous university archive the better.” At issue here are potential conflicts between the research project being carried out and the ontological, ethical as well as axiological (moral and aesthetic) codes essential for constituting the ecosystem of ITM (Romanova, 2018).
While there were many users of thesession.org who did not respond, and several made non-negative remarks, these two comments identify a core truth about Sturm et al. (2016)’s work: its contributions are principally not to ITM or this online community even though the research derives value directly from them – not to mention the happenstance of having a reproducible and efficient approach for machine sequence learning with enough expressive capacity to be surprisingly successful at imitating the syntax from the tens of thousands of hand-entered crowd-sourced transcriptions. The website thesession.org is built, maintained and made free and openly accessible by its contributors with the goal of preserving and furthering a worldwide community of ITM practitioners. Any use of the resource that does not contribute to this goal, regardless of whether proper academic acknowledgement is made, represents a potential misuse and should be examined. A deeper question also arises: what good has this scholarly pursuit actually accomplished other than the advancement of the involved MIR researchers’ careers?
There are several ways to respond to the potential misuse of MIR technology in this scenario. One is to financially contribute to the maintenance of the artistic community, e.g., providing its creator and maintainer with funding to cover costs associated with web hosting, or to hire practitioners of ITM to perform at public-facing research events. Does such an “indulgence” nullify the misuse, thus granting the researchers access to the resource? Another response is to point to “the commons”, i.e., material that can be used by anyone without having to first secure the rights (Boyle, 2008). In this case, the involved material is creative content that is not covered by copyright protection because its exclusive rights have expired, or because such rights are not applicable to it for the lack of an identifiable author or for other reasons (Frith and Marshall, 2004). As much of the music notated by users of thesession.org has passed into the public domain, we might say that the material can be legally used however we wish. Several complications, however, arise from this argument.
First, some transcriptions at thesession.org are of compositions still protected by copyright (Sturm et al., 2019). Second, just because something is in the public domain does not mean anything can and should be done with it (McLeod, 2001; Seeger, 2004). Living composers of folk music transcribed on thesession.org are likely to overlook such infringement because of the service it provides in building and preserving a worldwide community of ITM practitioners.17 Even if one were to remove these transcriptions from the training material of our machines, there is still the problem of misuse. Here, the legal does not equate with the ethical. Unwarranted expropriation can be seen as a colonialist gesture (Zeilinger, 2021), especially when done in the context of cultural heritage communities such as ITM, even if the resulting derivative works are not in turn enclosed in the propertized domain of copyright.
A third response to this potential misuse is to exclaim that the featured researchers are merely scholars and scientists working at a university and therefore: 1) they are seeking to advance knowledge of the world; 2) they are performing fundamental research that could benefit ITM in ways that may not be immediately tangible but are nevertheless worth pursuing; and 3) while they may benefit professionally from the work, they are not profit-driven, commercial entities seeking to exploit artistic traditions for financial interests. There are several problems with these statements, not the least of which is the romanticized and inaccurate notion of the university as an “ethically neutral” guardian of knowledge divested from selfish, profit-driven concerns (Tesar et al., 2021; Fleming, 2021). Another issue to point out is that the knowledge produced in the university, and the training such learning institutions provide, travels outside its walls. While some researchers might be intellectually fulfilled at how well the Irish double jig is modeled by some state-of-the-art machine learning system (Sturm and Maruri-Aguilar, 2021), they might not fully appreciate that the methodologies they develop and the students they train could one day contribute to developing technologies that are aesthetically, ethically and politically problematic, e.g., populating playlists of “traditional music” with machine-generated performances of machine-generated tunes.
A fourth response addresses the core of both comments from the two users of thesession.org cited above. Upon first read, one might rush to the conclusion that there will always be hostility directed toward technological innovation; or that traditionalists will always see themselves as “gatekeepers” of certain well-delimited territories and thus resist any perceived incursion that might progressively alter the community’s core values or raison d’être; or that it is in the interest of the attention-grabbing media to portray humans at war with the machines automating what were once ways of making a living, rendering the little jig-composing system “the beginning of the end” that is desperately in need of quarantining. This, however, does not faithfully or thoroughly reflect the sentiments conveyed in the user comments.
Scholarship on the history of ITM and of Ireland itself depicts a culture that has suffered considerably from imperialism, whose customs were banned and starvation was exacted as a tool of subjugation (Woodham-Smith, 1962). This has led to diasporas of generations of traditional musicians, dispersing Irish culture around the globe (Ó hAllmhuráin, 1998). In aging Irish communities such as those in New York and Chicago, traditional music practices were gradually replaced by more popular music, leading to a series of efforts taken to preserve and promote this disappearing tradition through tune collecting, e.g., Francis O’Neill and his “1001” (O’Neill, 1907). The revivals of folk music beginning in the 1950s in Europe and the USA — along with the founding of Irish culture-promoting organizations, such as Comhaltas Ceoltóirí Éireann in 1951 — transformed ITM into a significant cultural resource, culminating in globally popular Irish music bands such as The Chieftains and The Dubliners and dramatic shows such as Riverdance and Lord of the Dance. This commercial success, however, is seen by some as coming at the expense of “authenticity” (Vallely et al., 1999).
An appreciation for the significance of authenticity in ITM can be gained from actually learning to play this music in a traditional way, i.e., by learning from master musicians through oral transmission. From August 2019 to April 2021, Sturm took a total of 46 lessons with Paudie O’Connor, a traditional button accordion player who learned from several well-known musicians in his region of Sliabh Luachra, including Johnny O’Leary and Jackie Daly. Among them, O’Leary was taught by Padraig O’Keeffe (fiddle), who had taught a number of notable musicians in the area, including Julia Clifford and Denis Murphy. O’Keeffe had learned from Tom Billy Murphy, a blind itinerant fiddler in the area, who would be led by his donkey on a circuit from house to house throughout the year in the early 1900s and playing dances at the houses that would board him. Many tunes O’Connor taught Sturm were accompanied by stories about the history of the tune: where O’Connor had heard the tune and from whom, and possibly who they learned it from, and so on. An essential part of several lessons was discussion about how the repertoire is practiced today and in the past, and the different value systems at play in ITM. It is from such active exchanges that students of ITM begin to understand the continuing debate within this “music ecosystem” over what is traditional and what is not, how traditional music is preserved and promoted, and what roles technology can play.
It should now be clear how the above-mentioned (mis)use of thesession.org — as training material for the machine generation of tunes that imitate ITM — may collide with values of ITM practitioners emphasizing history, authenticity, and preservation. One commenter at thesession.org mentioned, “I think it’s reckless to send 3,000 machine-created [tunes] into the world.”18 The resulting machine, no matter how plausible its creations may be, is by its nature totally ignorant of the music. Another commenter at thesession.org remarked that the researchers should “teach [the system] to dance first”. After all, the machine is merely shifting symbols around according to probabilities derived from an impoverished representation of ITM (Searle, 1980). Seeking to imitate ITM by computer implies the music is trivial and reducible, disconnecting ITM from its spirit, its history and stories, and its “aura” (see Huang and Sturm (2021) for a discussion of “aura” or “authenticity” in AI-generated tunes in the style of ITM). Performing ethical research in this context thus requires MIR researchers to understand and respect the total ecosystem of ITM. For this, we must be in continuous non-hierarchical dialogue with its practitioners and make clear what our intentions are, how we appreciate and even share their values, and how we position ourselves in relation to ITM’s past, present, and future. Approaches such as Value Sensitive Design can provide guidance for integrating reflective and participatory perspectives into such research and development work, as richly illustrated by the work of Batya Friedman and others (Friedman and Hendry, 2019; Davis and Nathan, 2015; Borning and Muller, 2012; van den Hoven, 2007). It is important to note here that superficial involvement of the practitioners is not a sufficient end goal in itself, since without genuine reflections of the nature and reciprocity of the interaction, attempted agonism runs a risk of turning into hollow “participation-washing” (Birhane et al., 2022; Sloane et al., 2022).
One example of collaboration between MIR researchers and traditional music practitioners is provided by Sturm’s lessons with O’Connor, who is aware of Sturm’s research but has said he feels neither offended nor threatened. In past lessons, the two have looked at particular outputs of Sturm’s system and discussed how the generated tunes relate to ITM, what makes them successful (or not), and how to make them more emblematic of the style. For Sturm, it was challenging to perform such “machine folk” – to take a tune one has never heard performed before and attempt to play it in the style of ITM. It is a unique way to interact with the tradition. As another example, O’Connor and three other practitioners of ITM were hired by Sturm as judges in The AI Music Generation Challenge 2020,19 which focused on Irish double jigs judged against a historic collection (Sturm and Maruri-Aguilar, 2021). Each of these yearly challenges involves working with traditional music practitioners, and motivates participants to reflect on each musical style, genre, or tradition being explored and their relation to it. This interdisciplinary collaboration continued with the The AI Music Generation Challenge 2022,20 which focused on Irish traditional reels.
Believing a computer system generating “deep fake” (Mersch, 2022, p.64) jigs and reels by the millions poses a real threat to ITM is to assume that the latter is weak, vulnerable, and can be so easily threatened. This belief overlooks the substantial national support for traditional music provided by the Irish government, as well as the numerous families enrolling their children into traditional music practices, particularly the yearly summer schools organized around the country. This belief also ascribes far too much power to the “notated” tune for a practice that is principally aural.
Returning to the question, “explain how this is going to contribute to ITM”, two more points may be made. The first is that whatever AI-powered system comes about that generates jigs and reels, it is likely not going to contribute anything profoundly meaningful to ITM — at least from the perspective of insider practitioners. It is when someone chooses a machine-generated tune and breathes story, life, and history into it that the “process of authentication” can begin and a contribution may be made (Sturm and Ben-Tal, 2018; Huang and Sturm, 2021).21 Furthermore, considering the role computational technology plays in our world today, why not involve it somehow in making music that could be considered traditional in the future? The second answer is that our follow-on research, more critically considering the politics of our work, contributes to the dialogue — both intellectual and practical — concerning tradition, innovation and (post)modernity, the meaning and role of authenticity, and ways of working responsibly as engineers with ITM and other traditions. The “toys” developed in the early stages of Sturm et al. (2016)’s work, built from a “misuse” of thesession.org, set the stage for our current, critical reflection on all these issues. Finally, with our increasing understanding of the value systems at play in ITM and our expanding network involving traditional music practitioners, we are uniquely positioned to make direct contributions through co-designing technology with them, e.g., enriching access to historical archives, or automatic tune identification.
5. Conclusion
We have argued in this article that MIR should diversify beyond the mere diversification of its datasets, and should embrace a non-hierarchical form of interdisciplinarity. We urge MIR researchers to consider how ethics of the supposedly “universal” technologies they develop may be biased by Western-centric theoretical frameworks, on the one hand, and be insensitive to local practices, traditions, knowledge, and values that come together to form distinct music ecosystems. In this “fundamentally fractured and discontinuous” world(s) we inhabit (Seligman et al., 2008, p.11), MIR must scrutinize its ontological, epistemological, methodological, and axiological assumptions in order to complete both an ethical and a cultural turn and thereby realize “unforeseen, novel methodologies and theories” (Born, 2020, p.200). The first case study illustrates how philosophical re-orientations beyond Western frameworks can open the door to what Hui (2021) terms “multiple cosmotechnics” (p.41), thus enriching our understanding of the ethical and cultural consequences of MIR. The second case study reflects on a case of music data misuse, and motivates deeper questions touching on responsible engineering and ways of working that consider the values of music practitioners next to those of the engineer.
How can MIR contribute to cultural practices it investigates? What is valuable to those communities and what is valuable to the community contributing to MIR? How do musical communities wish for their practices to interact with emerging technologies (if at all), and what do they consider as potential misuses of their traditions? Ultimately, what does a given musical community consider as “music”? Additionally, as technologies do not occur in a vacuum, how should MIR position itself in relation to the humanities and the social sciences — the “contemporary fields of the study of culture” (Bachmann-Medick, 2016)? And even beyond these lie questions concerning the mechanisms of research, such as funding, pressure to publish, and league tables of academic programs, not to mention the many practical hurdles to effective, interdisciplinary collaborations. How do these forces square with non-hierarchical interdisciplinarity involving practitioners who are outside the “ivory tower” (Fleming, 2021)? In search of answers to these questions, each of our four ologies can play a role: the nature of the music and sound being investigated (ontology); the sources and scope of knowledge in a specific musical context (epistemology); the strategies to get to such knowledge (methodology); and, finally, the ways to understand “goodness” or value in and of music (axiology).
It is unlikely that we can develop universally valid solutions to such challenges. Rittel and Webber (1974) coined the term “wicked problem” to describe problems that cannot be precisely formulated, and that do not have a definitive “best” solution, since each solution may lead to one or several new problems. There may be two ways out: the first is to continue business as usual, and technical solutionism has indeed worked pretty well for some decades of ISMIR; the second option is to attempt the “agonistic mode of interdisciplinarity” proposed by Born (2020) in MIR: reflect on and fearlessly rethink the four ologies concerning our discipline, strive for an equitable exchange with other disciplines and stakeholders, and let this process shape novel and epistemologically diverse research questions that go beyond building a technical problem around a dataset and criticizing (without balanced dialogue) the “data-poor fields” for not showing interest in “upgrading” and “catching up”. We argue for the second option, especially after having experimented with it ourselves — in part via the writing of this article and all the intellectually stimulating arguments that arose along the way — between (ethno)musicology, sound studies, computer science, philosophy (early Chinese philosophy and philosophy of technology), law, and some other wicked fields represented by the co-authors. It is not the easy way out, but it is inspiring and enjoyable, and in the absence of simple solutions, it is the most responsible thing to do.
Notes
[5] Confucianism is a school of early Chinese philosophy developed from the teachings of Confucius (551-479 B.C.) and his disciples. It has gone through phases of change throughout Chinese history (Yao, 2000).
[8] According to our analysis, this relation increased in strength until 2010, after which it stagnated on a low level in the MIR community.
[9] For an extended analysis, please refer to our ISMIR 2021 paper (Huang et al., 2021).
[10] https://bit.ly/3INsWaV.
[11] Confucianism as a system of thought tends to value family-oriented moral qualities (such as filial piety) and the ideal of a “harmonious” world, which is in part achieved through the maintenance of “hierarchically rigid family and social relationships” (Yao, 2000, p.13, p.182).
[12] Actor-network theory makes no analytical distinction between human beings and technological artifacts, and between subjects and objects (Taylor, 2001, p.32).
[13] A central concept in Daoist thought variously translated into “self-so”, “spontaneous”, or “natural” (Ziporyn, 1993).
[14] A philosophical tradition associated with the texts of ancient thinkers such as Laozi and Zhuangzi.
[15] The pair of yin-yang is a central concept in Daoism that refers to two opposite yet interconnected and complementary forces.
[17] A good example of this is the tune “Ashokan Farewell” composed by Jay Ungar in 1982. Ungar only specifies a mechanical license is required for audio recordings of his music, and does not protect derivative transcriptions https://jayandmolly.com/licensing/.
[20] https://bit.ly/3V4kZ44.
[21] See also bit.ly/3FCJOOS where Sturm is exploring “machine folk” tunes.
Funding Information
This work is partially supported by: the Presidential Post-doctoral Fellowship at the University of Hong Kong (HKU-PPF); the Wallenberg AI, Autonomous Systems and Software Program – Humanities and Society (WASP-HS) funded by the Marianne and Marcus Wallenberg Foundation (Grant 2020.0102), and by the Swedish Research Council (2019-03694); as well as the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 864189 MUSAiC: Music at the Frontiers of Artificial Creativity and Criticism).
Competing Interests
The authors have no competing interests to declare.
