Have a personal or library account? Click to login
Editorial: Data and Workflows for Multilingual Digital Humanities Cover

Editorial: Data and Workflows for Multilingual Digital Humanities

By: Lorella Viola  
Open Access
|Jun 2024

Full Article

(1) Context and motivation

The production and management of knowledge have been deeply influenced by the power structures embedded in technology, factually shaping not only who has access to create, share, and consume information but also how such access is experienced (Viola, 2023). The foundational power structure in digital knowledge production – digital access – does not solely include access to reliable internet services or modern technological devices, but also the number of languages in which digital content is available. Thus, the strong predominance of English and a few other languages among platforms and tools inherently limits participation in digital knowledge creation and dissemination, exacerbating the already existing inequalities in the wider society (Viola, 2023; Viola & Spence, 2023). Moreover, overemphasis on English and Western norms can lead to cultural homogenization where unique cultural practices, languages, and histories are underrepresented or misrepresented in digital formats, leading to a loss of cultural diversity and richness (Risam, 2015, 2018). Projects that focus primarily on English-speaking regions or on literature and sources in English for example miss the opportunity to engage with a rich array of non-English materials, which can skew the understanding of global histories and cultures (Gold & Klein, 2019). Finally, digital humanities tools and resources primarily available in English alienate scholars, students, and the public who are non-native English speakers, potentially excluding them from participation and access to knowledge.

This strong language bias ultimately leads to a lack of diverse perspectives in digital content and restricted access to information for those on the less privileged side of the divide. As it amplifies certain voices and suppresses others, already marginalized communities are excluded from the digital knowledge landscape, which narrows the overall diversity of accessible information (Viola & Spence, 2023). Digital literacy further complicates access, as individuals need to possess a certain level and type of skill to navigate, evaluate, and create digital content effectively.

It is true that in recent years, significant indications of change and efforts have tried to address the strong linguistic bias towards English in digital practices. Initiatives such as large multilingual projects, special interest groups, multilingual platforms and infrastructures, multilingual language models and computational resources have certainly contributed to dismantle the Anglocentric paradigm. For example, Unicode has helped standardize text representation across many languages, machine translation has improved significantly, more inclusive metadata in multilingual settings have emerged, and datasets annotated in different languages are now more largely available. However, despite notable achievements and greater awareness of digital inclusivity, Digital Humanities (DH) still remains English-dominated, as evidenced by a persistent comparative scarcity of stable tools for handling non-Latin scripts and many unresolved issues with languages structurally different from English.

The Special Collection ‘Data and Workflows for Multilingual Digital Humanities’ brings together seven contributions that explore digital inclusivity, that is how power structures operate in digital knowledge production, with a focus on multilingualism and the management of multilingual data. It presents an assortment of papers that showcase innovative workflows for multilingual data acquisition, curation, integration, and analysis, reflecting the efforts of Multilingual DH research to ensure broad participation and representation in digital knowledge production. In this way, the Special Collection aims to contribute to current debates about recognising the digital as a culturally situated and organic entity (Cameron, 2021; Viola, 2023) which reacts to and impacts on institutional and methodological frameworks for knowledge creation, and which therefore bears important consequences for how we create knowledge today. Offering an overview of innovative approaches and results in Multilingual DH research, the collection highlights the richness and the challenges of the intersection between multilingualism and the digital realm for humanities research.

(2) Description

The necessity to handle, process, and analyse data in multiple languages brings with it a complex array of technical, cultural, and ethical considerations. Indeed, in a multilingual digital context, effective data acquisition, curation, and integration strategies become ever more critical to ensuring the seamless incorporation of resources into research projects. Each contribution in this collection not only addresses these challenges but also pioneers innovative methodologies and shares workflows that enhance the accessibility and inclusivity of DH research.

One central aspect highlighted by all the contributions is the necessity of creating flexible data models that can capture the nuanced expressions of cultural heritage in diverse languages, promoting greater inclusivity and accuracy in digital humanities research. An example of how researchers’ access to digital tools and methods can be improved through the integration of comprehensive and reliable digital documentation is the digital research infrastructure SSH Open Marketplace (https://marketplace.sshopencloud.eu/). This digital infrastructure utilizes workflows to facilitate research in DH and social sciences, with a strong focus on adaptability, community involvement, and open science practices. The platform allows resource linking, provides a structured approach, connects users with datasets, manuals, and digital tools facilitating a broad range of research domains and promoting interdisciplinary work (see Barbot et al. in this collection and their presentation of how multilingual workflows in the SSH Open Marketplace enhance the findability of resources and support open research practices).

All the articles in the collection also stress the crucial role of structured workflows for supporting multilingual research in various research domains such as lexicology, historical research, and legal studies. For example, by creating a streamlined process that enhances the feasibility of conducting research in multilingual and multidisciplinary contexts, structured workflows can make resources accessible and manageable, fostering interdisciplinary research across the social sciences and humanities (see Marongiu, McGillivray, and Khan in this collection and their discussion of language-independent workflows for lexical semantic change detection).

Sustainable and reproducible workflows and ethical considerations are also part of the governance challenges of multilingualism in the digital space. For example, Scalable, interactive, and user-friendly web applications that integrate bibliographic and NLP methodologies are fundamental for allowing researchers to perform dynamic text and data mining tasks in multiple languages. However, the development of tools that support multilingual metadata content exploration, analysis, and visualization must also tackle the challenge of including diverse linguistic content in DH research. Paying special attention to language biases in the creation of metadata of large digital collections can for instance influence what information is visible and to whom it is shown. Thus, biases in bibliographic metadata can lead to the amplification of certain viewpoints over others, affecting the diversity of knowledge and perspectives in digital spaces (see Péter et al. in this collection and their presentation of the AVOBMAT tool for researchers dealing with multilingual bibliographic datasets).

The multilingual challenge in bibliographical data handling also underscores the need for international standards in bibliographical data curation. The application of international standards for bibliographical data processing and reproducible workflows bears even more significance for sharing best practices and promoting data harmonization and interoperability across different languages such that data reuse and comparative research in the field can be facilitated (see Malínek et al. in this collection and their discussion of workflows for bibliographical data curation in the context of research developed by the DARIAH ERIC consortium’s Bibliographical Data Working Group).

As data models of digital research infrastructures typically do not support multiple translations for the same metadata field, preserving bibliodiversity is equally crucial for minimizing information loss and cultural heritage preservation. Technical and methodological challenges include the integration of non-English data sources to handle multilingual datasets efficiently and ensuring that the bibliographic data retains its cultural and linguistic richness to enhance inclusivity and global knowledge dissemination. Hence, another challenge in multilingual DH is the adaptation of existing infrastructures to accommodate language-agnostic models. The innovative workflows for cultural heritage preservation and humanities-oriented studies presented in the Special Collection provide research-based initiatives and solutions to the limitations of existing independent infrastructure organizations such as OpenCitations (see Moretti et al. in this collection and their presentation of the integration of the Japan Link Center’s – JaLC bibliographic data into OpenCitations).

Multilingual Digital Humanities also holds the potential to revitalize historical text collections, making them more accessible and interactive for researchers, for example through the creation of innovative workflows that enhance the accessibility and utility of historical linguistic data in a multilingual digital format. This includes the development of workflows for handling complex language issues like text normalization, machine translation, and handwritten text recognition. Such sophisticated workflows have broader implications for historical research in DH while at the same time they highlight the transformative potential of digital initiatives in revitalizing historical linguistic studies (see Ströbel et al. in this collection and their presentation of the Bullinger Digital project, which transformed a corpus of Heinrich Bullinger’s letters into a fully digital format). On the one hand encoding languages in historical data entails several challenges including dealing with historical language evolution, vocabulary borrowing, extinct languages, technical standards, and political sensitivities around language identity. On the other, these difficult tasks can be accomplished while emphasizing transparency, flexibility, and iterative refinement, encouraging collaborations and engagement with historical multilingualism (see Pepping in this collection and her discussion of the multilingual aspects of the Dutch East India Company archives). Multilingualism in archival material ultimately reflects complex historical interactions making ever more evident the need for flexible and transparent encoding practices.

(3) Discussion

With its impressive selection of contributions, the Special Collection ‘Data and workflows for Multilingual Digital Humanities’ provides insights into the unique workflows and practices that emerge when dealing with multiple languages within the digital humanities landscape. It amply illustrates that the strategies for data acquisition, curation, and integration are paramount to guarantee the smooth integration of multilingual resources into research endeavours. As DH researchers have gained access to vast amounts of data in the form of texts, artifacts, and cultural phenomena across diverse languages, it is essential to establish detailed and trustworthy digital records that provide invaluable assets for subsequent research. This includes the gathering and digitization of materials, the integrity and uniformity of data, and the creation of suitable metadata standards and ontologies. Workflows in multilingual digital scholarship, and in DH more broadly, include the procedures, methodologies, and tools used for research and data analysis. An effective workflow ensures that research activities are conducted smoothly, enhancing reproducibility, collaboration, and scalability. Particularly in DH projects that often involves cross-disciplinary teams, well-structured workflows aid in efficient communication and coordination among team members. Precise documentation of the workflow steps supports the replication of experiments and analyses, promotes transparency and enables the verification of research outcomes.

While it is of course impossible to feature the whole breadth of work on multilingual DH, this Special Collection draws attention to many of its technical and methodological challenges. The case studies featured here allow researchers and practitioners to explore the innovative methodologies and the challenges faced while working with multilingual digital resources and the technical and methodological frameworks required to navigate such diverse and rich linguistic landscapes. Beyond the solutions proposed by each of the contributions, collectively the results and conclusions may also serve as a platform to reflect on the implications of the current monolingual and monocultural biases in digital scholarship. In this way, the Special Collection effectively counteracts the Anglocentric paradigm in digital knowledge creation and promotes diversity and a more culturally and linguistically inclusive digital future.

(4) Conclusions

The power structures in digital knowledge either include or exclude individuals and groups, inevitably affecting diversity and equity in digital spaces. As they intertwine technological, linguistic, and cultural factors, addressing the challenges of digital monolingualism requires multiple strategies, including technological innovation, policy reform, and community-driven efforts to democratize digital spaces and simplify the complex processes involved in multilingual Digital Humanities research. In addition to providing comprehensive overview of current practices and future directions in the field, the Special Collection ‘Data and Workflows for Multilingual Digital Humanities’ also hopes to inspire further multilingual research and to foster a broader discussion on the integration of multilingual data workflows in digital humanities projects. The inclusion of diverse multilingual digital humanities case studies in this Special Collection reflects the efforts of Multilingual Digital Humanities research to ensure broad participation and representation in digital knowledge production, from supporting extensive pre-processing and analytical capabilities, to enabling dynamic text mining and data analysis, to emphasizing transparency, flexibility, and iterative refinement in handling historical, linguistic, technical, and political challenges.

At the same time, the contributions collectively highlight several open challenges that the field still faces but which may pave the way for future work. For example, Multilingual Digital Humanities still struggles with interoperability and scalability; there is a constant need for systems that can adapt to various languages and disciplines. This involves developing tools and methods that are not only robust but also flexible enough to adjust to new linguistic data and technological advances.

The Collection also highlights the persistent issue of multilingual data scarcity, accessibility and quality. Multilingual Digital Humanities research needs to direct efforts towards improving access to high-quality data, including rare or complex datasets, and enhancing the accuracy and completeness of digital archives. Finally, the Collection is also an open call for enhanced collaboration and community engagement; work in the field should focus on involving a broader community in the development and refinement of digital tools and workflows, ensuring they meet the varied needs of researchers across disciplines.

While each contribution to this collection offers a unique perspective on the complexities and richness of multilingual data management in DH, together the articles offer fresh and valuable insights into computational techniques for handling texts in several languages. Significantly, it evidences that addressing issues of digital inclusivity and Anglocentric bias is not only a matter of ethical importance but it also enhances the academic rigor and relevance of digital humanities. By embracing a more inclusive approach, the field can contribute to a more equitable and comprehensive understanding of human knowledge and cultural heritage. Despite these open challenges, the innovative workflows for multilingual data acquisition, curation, integration, and analysis presented in the Special Collection contribute to make digital knowledge production more inclusive, equitable, and representative of global diversity.

Competing Interests

The author has no competing interests to declare.

Author Contributions

Lorella Viola: conceptualization, writing, review, editing.

DOI: https://doi.org/10.5334/johd.220 | Journal eISSN: 2059-481X
Language: English
Submitted on: Apr 17, 2024
Accepted on: May 14, 2024
Published on: Jun 10, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Lorella Viola, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.