Exploratory Computation in Digital Humanities: A Qualitative Evaluation Framework

Cindy Anh Nguyen; Alejandro Alvarado Rojas

doi:10.5334/johd.500

Full Article

(1) Introduction

Benchmarking has become a central research practice with the adoption of computational methods across disciplines. Benchmarking frameworks have progressively been integrated from computer science into the natural sciences and, more recently, into the social sciences and digital humanities (Lazer et al., 2020). As a research practice, benchmarking facilitates the comparison and assessment of models addressing similar research problems (Weber et al., 2019). Researchers may rely on benchmarks to validate research procedures by translating theories into measurable tasks that can be tested against empirical and synthetic data (Zhan, 2021). Benchmarking also influences which problems, procedures, and solutions are important to a discipline, supporting them through reproducible research procedures and the replication of their evaluation conditions (Bartz-Beielstein et al., 2020). These implementations reveal that benchmarking not only evaluates computational models but also shapes the research problems themselves.

Digital humanities research has implemented benchmarking to construct standard datasets to test the effectiveness of tasks, including developing and comparing models for stylistic pattern recognition (Benotto, 2021), streamlining computational approaches for visual content analysis and labeling (Pustu-Iren et al., 2020), and mapping semantic meaning in multilingual data and controlled vocabularies (Kraus et al., 2024). Benchmarking can support the operationalization of humanistic tasks and offer guidelines for measuring research validity. Yet, current computational benchmarks lack a framework for evaluating the interpretive, iterative, and contextual processes that digital humanities undertake as exploratory meaning-making of cultural artifacts and corpora.

This discussion article proposes a qualitative evaluation framework for assessing the effectiveness of computational methodologies for exploratory research in digital humanities, which we term exploratory computation. We argue that interpretive strategies in early exploratory research warrant evaluation because formal benchmarks rarely account for exploratory tasks. This article begins with a brief overview of benchmarking as a research practice and its significance for digital humanities. Then we highlight how interpretation guides computational methodologies for humanistic inquiry. Next, we present a qualitative evaluation framework grounded in exploratory computation and apply it through a case study on exploratory data visualization. Lastly, we discuss the implications of this framework for qualitatively evaluating computational research, centering the scholarly communication of interpretive work, and its significance for more responsible and reproducible research.

(1.1) Benchmarking in Digital Humanities

In computational research, a benchmark is a “standardized validation framework that allows for a direct comparison of the prediction accuracy of various models that address the same research problem” (Pankowska et al., 2023, p. 1). Typically, a benchmark contains a motivating comparison that links a research problem to model evaluation criteria, a task sample that operationalizes the problem through representative test tasks, and performance measures that assess how well algorithms perform on task samples (Sim et al., 2003). Digital humanities researchers have used computational methods for some time, with recent work incorporating machine learning techniques and benchmarking practices. Machine learning “works by analyzing an existing dataset, identifying patterns and probabilities in that dataset, and codifying these patterns and probabilities into a computational construct called a model” (Broussard, 2018, p. 33). Machine learning operationalizes research problems into tasks through techniques such as regression, prediction, classification, and clustering, which rely on benchmarks for parameter fine-tuning and dataset adjustments to optimize results (Weber et al., 2019). Common computational applications in the digital humanities include object character recognition (OCR) and hand-written text recognition (HTR) to process large quantities of texts, image classification of artworks, and speech recognition on oral histories (Segessenmann et al., 2023). Computational methods support humanistic inquiry by extending literary criticism into algorithmic representations of humanistic data and model-based testing through benchmarking (Dobson, 2021).

Benchmarking goals in research include visualizing and assessing algorithms and problems, testing performance sensitivity across varied conditions, training models on benchmarks, developing heuristics, and validating new algorithms (Bartz-Beielstein et al., 2020). These goals standardize evaluation conditions for model comparison, enabling reproducibility and replicability across research applications (Peng, 2011). Importantly, benchmarking may also overlook social problems such as representational biases in race and gender (Buolamwini & Gebru, 2018) and incongruencies between model performance and real-world meaning (Bender & Koller, 2020). These concerns are compounded in the digital humanities because computational notions of reproducibility associated with benchmarking focus on re-executing code under fixed evaluation conditions (Bartz-Beielstein et al., 2020), unspecifying the contextual judgments, labor relations, and meaning-making practices in all types of research. In effect, “the interpretation of reproducibility associated with [experimental research] works better for certain methods, stages and goals of inquiry than for others, thus proving to be inadequate as an overarching criterion for what reliable, high-quality research needs to look like” (Leonelli, 2018, p. 7). This suggests that reproducibility in digital humanities should stress transparency of the interpretive processes of humanistic inquiry, aligning with its methods, workflows, and forms of analysis that contribute to pedagogy and development of the field.

The adoption of computational methods complicates the understanding and practices of reproducibility of digital humanities research. Scholars using these methods either align their research with computational evaluation standards, reducing reproducibility to a technical procedure, or they adhere to humanistic evaluation, risking their work may be perceived as less rigorous if nuance and context are preserved (Joyeux-Prunel, 2024). To reconcile this tension, Joyeux-Prunel (2024) advocates for post-computational reproducibility that recognizes the limitations of methodological reproducibility in computational applications, recentering human interpretation to foreground how “various scales of analysis interact with each other to strengthen the reliability of research outcomes.” (p. 41). This expands reproducibility by recreating conditions that contribute to the reasoning and interpretation of computational outputs. Advancing post-computational reproducibility makes visible the broader context that informs interpretation in computational workflows.

(2) Reframing Benchmarking for Exploratory Computational Inquiry

(2.1) Interpretation, Computation, and Digital Humanities

Building on existing work on benchmarking in digital humanities, we recognize that benchmarking emphasizes experimental research design centered on hypothesis testing, theory validation, and standardized evaluation of computational models for machine learning tasks. Varela (2021) characterizes these tasks as “data-driven” research tasks that provide answers that are “hard to vary”, justify “reasoning under constraints”, and depend on “productive reductionism” (p. 10) that brackets assumptions. Conversely, “data-assisted” research methodologies underscore the interpretation of tasks, offering multiple views, and resituating a question within context; both of these orientations constitute computational research practices.

Starting from the interpretive process helps identify how benchmarking may standardize meaning from the sociocultural and historical significance of corpora. This carves out a space for rethinking the role of interpretation in making computational research reproducible (Dobson, 2021). We build on Ringler’s (2024) argument for a hermeneutics for computation to foreground interpretive and reflexive engagements with the sites, objects, and processes of computational research. A hermeneutics for computation intervenes in method-centric research by centering computation itself as an interpretive process, acknowledging that “while new tools may construct new reads on artifacts like texts, the data they produce does not speak for itself and tell you what those new pieces of data mean” (Ringler, 2024, para. 2). Benchmarking is a key factor in this interpretive process for determining the significance of computational results against field-specific standards; we build on these commitments to reproducibility and standardization by emphasizing qualitative evaluation of computational interpretation.

Computational tools and their methodologies function as “interpretive screens” (Ringler, 2024, para. 11) that transform corpora into data visualizations, tables, and statistical relationships. These transformations shape meaning construction from algorithmic renderings by simultaneously foregrounding and obfuscating relations, qualities, and representations within and across texts; computation is a mode of asking with to answer humanistic questions. Furthermore, asking about computation examines how a computational perspective changes the processes of understanding. Rather than treating computation as antithetical to humanistic inquiry, it is formative for guiding interpretation and meaning-making.

Recent research shows that adopting computational hermeneutics is necessary to contextualize computational systems (Kommers et al., 2025). Specifically, this work argues that “evaluation methods in artificial intelligence often overlook an important conception of culture: not as a variable to be measured, but as a dynamic, contested space where social meaning is made” (Kommers et al., 2025, p. 2). This highlights a key limitation of benchmarking; it treats cultural activities and artifacts as static representations, turned into tasks, and evaluated through model testing. Accounting for social and historical context in benchmarking requires more iterative, response evaluation of cultural change, attention to human judgement within its “communicative context” (Kommers et al., 2025, p. 7), and contextual relevance for model interpretability. Historians Gibbs and Owens (2013) have advocated for computational methods to support exploratory research to generate and refine research questions through “cyclical processes of contextualization and interpretation” (p. 160). Recognizing these nonlinear iterative stages in computational research is much less common than evaluating results for argumentative, evidence-based claims (e.g., hypothesis testing, tests of significance). Communicating the intellectual value of exploratory research stages is challenging because interpretation is emergent, tentative, and tacit.

(2.2) Exploratory Computation as Digital Humanistic Inquiry

We propose the concept of exploratory computation, an approach to interpreting humanistic data through quantitative processes such as descriptive statistics, computational and algorithmic modeling, and exploratory data visualization (see Figure 1). Exploratory computation recognizes the interwoven role of interpretation, positionality, and iteration within computational research processes. This approach builds on exploratory data analysis (EDA) by emphasizing exploration of the data as “the foundation stone--the first step” (Tukey, 1977, p. 3) to reveal its structure, contents, and contextual relevance. Exploratory computation emphasizes flexibility in research design by intentionally demarcating its scope from data exploration, guided by three values: “judgment, experience, and even pluralism” (Campolo, 2021, p. 87). Through these values, exploratory computation incorporates the positionalities of researchers and collaborators, their evolving understandings of the research problem, and the range of research designs to address it.

Research Process Flowchart. This flowchart depicts a general representation of the research process. The research stages inside the box (yellow) correspond to the iterative process between interpretative stages for meaning construction and exploratory computation tasks. The gray boxes correspond to analytical stages for experimental and evidence-based approaches.

Within digital humanities, exploratory computation can use a variety of methods to explore data structures, make visible absences and limitations, investigate thematic patterns, and contextualize meaning (see Table 1). For instance, performing different “readings” of datasets supports multiple levels of interpretation and strategies, including denotative readings to examine literal technical definitions, connotative readings to reconstruct the contextual hermeneutics of data production, and deconstructive readings to surface absences in the data and their politics of meaning (Poirier, 2021). In particular, connotative readings of data, exploratory data analysis, and exploratory data visualizations are important strategies for digital humanities research to reconstruct context. These strategies can be integrated with other qualitative methods, such as forensic data analysis to trace contexts in dataset production, documentation, and application (Alvarado & Twyman, 2025). Exploratory computation is foundational to digital humanities research design because computational methods can augment disciplinary approaches to analysis and offer alternative viewpoints of reading and meaning-making.

Table 1

Common Exploratory Tasks, Functions, and Relevant Methodologies (non-exhaustive). This table organizes the relationship between common tasks in exploratory computation, their different functions for data manipulation, and their relevant computational methodologies. Most of these methodologies process data quantitatively; however, they can be triangulated with other qualitative methodologies (e.g., thematic analysis, discourse analysis, visual analysis).

EXPLORATORY TASK	FUNCTION	RELEVANT METHODOLOGIES
Investigating Corpus Structure	Understanding the variable types, labels, and missing data	Codebook and variable exploration Frequency counts
Thematic Patterns	Exploring text or visual corpus for patterns, features, and style	Univariate analysis and data visualization (e.g. n-grams and word frequencies, most common, uncommon, and outliers) Bivariate and multivariate analysis and data visualization for descriptive statistics to visualize relationships between variables (strength of correlations through scatterplots) Measuring central tendency using mean, mode, and median Measuring dispersion and spread through standard deviation or interquartile range Thematic analysis and topic modeling Clustering through vector embeddings Exploring absence by modeling uncertainty and probabilistic estimates
Exploratory Data Visualization	Mapping visualizations that invite contextualized exploration of data and relationships	Network graphs of relationships in data structure Plotting thematic patterns as clusters and their location in the corpus Spatial mapping of data by location, time, and other labeled characteristics Data modeling and diagrams for complex relationships

(3) Proposed Qualitative Framework to Evaluate Effectiveness of a Computational Method for Exploratory Tasks

Rather than producing benchmark datasets and quantitative measures of effectiveness, we prioritize the interpretive process that undergirds benchmarking for research design evaluation and transparent communication of reproducible research. We propose a qualitative framework to assess how computational methods contribute to meaning-making in digital humanities research. This framework focuses on the more iterative, nonlinear, fuzzy research stages of humanistic inquiry, where exploratory computation encourages corpus and data exploration, question generation, and research question refinement. Specifically, this framework recognizes that researcher positionalities shape interpretive and evaluative work, highlighting how disciplinary approaches and scholarly communication practices are grounded in context.

Our framework draws from existing qualitative methodologies that treat interpretation as a central analytical practice. Qualitative methodologies generally describe reality as an outcome of human interpretation (Charmaz, 2005). This suggests that qualitative researchers are “more concerned with showing that their data is accurate and precise (or based on close observation) and broad (based on a wide range of variables)” (Clement, 2016, p. 156; Becker, 1996). Researchers often use instruments such as interview guides, focus group protocols, and codebooks to specify conceptual precision. These instruments are not only evaluated on research structure and interpretive logic, but also facilitate reproducibility, serving a pedagogical function (Kamberelis & Dimitriadis, 2005). Our framework operates as a broad instrument for qualitatively evaluating exploratory computation, which may be calibrated according to different problem types or contexts.

This framework consists of three interwoven principles and their respective guiding questions: positionality, task evaluation, and retrospection (see Figure 2). Positionality affirms the influence of a researcher’s social and material context in shaping knowledge production (Haraway, 1988); task evaluation assesses the suitability of a computational method for exploratory tasks; and retrospection emphasizes reflecting on research design and its pedagogical and public value. For example, research teams conducting exploratory research could apply this framework to assess labor and time costs of a method, determine its relevance to research goals, and adjust research design accordingly. Importantly, the framework builds awareness of the research process itself. Positionality drives project management reflections on clarifying project roles and individuals’ goals to align team workflow (Figures 3 and 4). Task evaluation supports flexibility for research question generation and refinement (Figures 5 and 6). Retrospection promotes more transparent communication of research design and limitations, with the pedagogical invitation for future scholarship to build upon the exploratory findings (Figures 7 and 8).

Qualitative Evaluation Framework for Exploratory Computation. This figure depicts the evaluative process for exploratory computation and its three interrelated principles. These principles mutually shape one another, and each principle is organized into actionable practices to inform interpretative work in the use of computational methodologies.

This framework further builds on project management (Dombrowski 2021; Siemens, 2016), such as timeboxing and clarified intention, as exploratory research unfolds: allocating dedicated time to curiosity-driven exploration, validating the method and its alignment with research method, generating and refining research questions, and critical reflection on early insights. This framework prioritizes research process over black box, techno-solutionist approaches that purport exceptional innovation and discovery, advocating instead for sustainable collaborative research design and minimal computing (Risam & Gil, 2022). Additionally, the framework facilitates the communication of meaning-making in interpretive work by centering positionality and incorporating a vocabulary for describing affective observations for qualitative evaluations: surprising, unexpected, frustrating, delightful, confusing, overwhelming, time-consuming, labor-intensive, difficult, easy, and accessible. As such, this framework contributes to digital humanities scholarship by extending reproducibility and transparency as a pedagogical intervention, particularly for students and early practitioners who may benefit from reverse engineering digital humanities projects (Posner, 2014) to assess how computational methodologies align with humanistic inquiry and problem contexts.

(3.1) Positionality

This principle refers to making context legible by considering the positionality of the research team, clarifying and evaluating the research design.

Assessment Form for Positionality Questions. These positionality questions render legible the research team intention and research design.

Assessment Form for Positionality Questions – Continued. These positionality questions focus on who will evaluate the computational methodology.

(3.2) Task Evaluation

This principle focuses on the evaluation of the effectiveness of a computational method for exploratory research tasks.

*Assessment Form for Task Evaluation Questions*. These task evaluation questions assess the validity and alignment of the methodology towards the research question.

Assessment Form for Task Evaluation Questions – Continued. These task evaluation questions consider how the research questions could be iterated or broken down for further evaluation.

(3.3) Retrospection

This principle highlights reflecting on research design, lessons learned, and communicating value to the scholarly community through pedagogy and publishing.

Assessment Form for Retrospection Questions. These retrospection questions communicate use cases and limitations of the methodology within the research project domain.

Assessment Form for Retrospection Questions – Continued. These retrospection questions evaluate project workflow, including infrastructure costs, technical stack, project management, and transparently communicate challenges, barriers, and successes based on research team goals.

(4) An Applied Case Study of the Evaluation Framework for Digital Humanities Exploratory Computation

We introduce a case study to showcase how our evaluation framework could be used in the research design and publication of findings from exploratory computation. Specifically, we demonstrate how task evaluation processes and questions could be used to break down the research questions into defined research tasks (task abstraction) and to evaluate the effectiveness of different data visualization methodologies to achieve those tasks (Figures 5 and 6). Vietnamese Visual Texts was an early exploratory digital humanities project led by Dr. Cindy Anh Nguyen (the co-author of this publication), who collaborated closely with Dr. David H. Laidlaw to computationally explore and identify patterns in a historical encyclopedia of Vietnamese material culture and social practices produced 1909–1910 with over 12,000 data points. Earlier stages of the project explored virtual reality methods of presenting and restructuring the dataset and combined the interests and expertise of Laidlaw in applications of visualization and computational modeling, and Nguyen in historical and book history questions of representations of gender and labor in visual and textual sources. The research team expanded the collaboration to include two undergraduate students, Kailiang Fu and Tyler Gurth, and in 2026, we published our study “Visual Exploration of a Historical Vietnamese Corpus of Captioned Drawings: A Case Study” (Fu et al. 2026) to share important takeaways from the exploratory work, namely, how to assess the role and effectiveness of exploratory visualizations for a large historical visual-textual dataset. Drawing from visualization literature for design and validation through observation of users and identifying clear tasks (Sedlmair et al. 2012; Munzner, 2009), the published research study by Fu, Gurth, Laidlaw and Nguyen evaluated the effectiveness of five visualization techniques (Distance Matrices, Hierarchical Clustering, Force-Directed Graphs, Minimum Spanning Trees, Dimensionality Reduction, and Radial Spanning Tree) created through text and image embeddings to achieve three defined tasks (task abstraction). Task abstraction identifies the functions of data visualization for historians in order to interpret meanings and patterns in complex multimodal data. The tasks consisted of T1: gaining an overview of the dataset; T2: generating and refining research questions; and T3: contextualizing data. By clarifying the task at hand, the team could then test the visualization techniques against their ability to achieve a specific, defined task through user evaluation. The participants of the evaluation included an expert evaluator (Nguyen, who was part of the project) as well as a focus group of digital humanities researchers, to rank different visualizations with a score of 1 to 3. This case study demonstrates how task evaluation processes and questions (Figures 5 and 6) could be executed through defined research tasks T1-T3 contingent on method selection and different levels of evaluation, such as expert and focus group reviews.

Yet what is not apparent in the structure of a research publication is the extensive discussions and iterative research design process of exploratory computation work over the course of the four years from initial design to publication. The research team moved across positionality questions such as “what is the research question?” and “who is evaluating the computational methodology (in this case, visualization techniques)?” (Figures 3 and 4) Many of the task evaluation questions emerged from initial qualitative conversations on which visualization was “effective” and “interesting” to Nguyen, the expert evaluator, which would guide the next stages of different computational methodologies. For example, visualizing the entire corpus through dimensionality reduction and minimum spanning trees proved interesting at first to reconstruct an entire corpus-level overview of the dataset, yet most of the insight was garnered through zooming into the intermediate-level connections through clusters and following connections of similarity in the visualizations. In a UMAP visualization of the dataset, Nguyen focused on a cluster of female itinerant laborers and provided additional primary sources and historical context to interpret the projection as a representation of early colonial labor practices in urbanizing Hanoi. In this case, visualizations offered a structured projection, an “interpretive screen” (Ringler, 2024, para. 11) of a large, unstructured corpus for meaning-making, contributing to findings of humanistic questions. At the same time, the exploratory work and reflections unearthed specific findings for visualization scholars, where visualizations can function at different scales of analysis and be informed by various information-seeking behaviors as determined by historians’ pursuit of different tasks (gaining an overview, generating research questions, contextualizing data). As briefly shown in the case study, the research team undertook a dialogical process to evaluate exploratory computation, where the positionality of the team and focus group shaped the research design, and transparent retrospective reflections were included in the “Lessons Learned” (Figure 7) of the research publication for future scholars to build upon. As the project formally concludes with the above publication, retrospection questions (Figure 8) can be helpful to the principal investigator to assess the next steps for the overall research that include the reconstitution of the research team, experimentation with new methodologies, and applications for external funding for a new stage of research.

Our proposed research framework of positionality-task evaluation-retrospection reverse engineers the long iterative process of the Vietnamese Visual Text case study and builds on lessons from teaching digital humanities methodologies to undergraduate and graduate students. Given the absence of such a framework during the project life of the Vietnamese Visual Texts case study, the proposed framework facilitates a formal structure of qualitative reflection to efficiently and effectively assess exploratory computation tasks at key research stages of research design, implementation, and publication. Furthermore, this framework encourages reflection on positionality, task evaluation, and retrospection to occur at earlier stages of research projects to support legibility of research processes to the team as research unfolds. As a pedagogical tool, task retrospection questions (Figures 7 and 8) are particularly important for students’ understanding of the possibilities and challenges of specific computational methodologies. Overall, the framework advances scholarly communication of exploratory computation as an important, iterative phase of research for meaning-making from humanistic data, where the evaluation framework could be incorporated into methodologies and lessons learned sections of research publications, project websites, and code repositories to encourage methodology transparency and reproducible research.

(5) Discussion and Conclusion

In this discussion article, we propose an evaluation framework for assessing and communicating the significance of exploratory computation in digital humanities. While benchmarking primarily standardizes evaluation conditions for model performance comparison, technical benchmarking does not capture the wide range of humanistic research, namely, computationally assisted processes of meaning-making in exploratory research (Kommers et al., 2025; Ringler, 2024). Our proposed framework qualitatively evaluates the contextual and nuanced aspects of interpretive work in exploratory computation. Specifically, it offers guidelines to facilitate self-reflection during the exploratory stages of research: positionality, task evaluation, and introspection. These principles are interrelated and guide interpretive work, making it more explicit for evaluation.

The case study illustrated how the proposed framework can be applied to evaluate exploratory computational tasks. The assessment of visualization techniques was primarily informed by discussions among researchers whose positionalities and domain expertise constructed meaningful insights from the corpus renderings; the presence or absence of patterns became meaningful only when interpreted within a context and related research questions. By explicitly examining the context of the research process, visualization techniques were evaluated not solely on statistical significance, but on their resourcefulness for supporting accessible, reproducible, and rigorous humanistic research design. In this manner, the framework aligns with Dobson’s (2021) efforts to treat computational objects such as benchmarks as interpretable outputs that provide novel yet flexible perspectives on corpora in digital humanities.

Working toward an evaluation framework that captures the interpretative activities in exploratory research in digital humanities is important for the field as well as adjacent domains. First, it situates hermeneutics for computation within the social and cultural conditions of scientific practice (Kommers et al., 2025; Ringler, 2024). Researchers in science and technology studies (STS) have shown how conditions directly shape the positionalities of researchers doing the interpretive work, which both enact and reproduce the shared knowledge systems of so-called expert cultures (Clement & Acker, 2019; Harding, 1992). Here, computational benchmarking serves as an authoritative practice for knowledge and claim-making that shapes the trajectory of a discipline (Denton et al., 2020; Sim et al., 2003). However, providing an alternative evaluative framework that centers interpretative strategies in exploratory computation advances “a holistic view of [scientific] context: one which does not diminish or remove contextual elements, even those with limited influence” (Sawyer & Jarrahi, 2014, p. 5). Exploratory computation reveals that different forms of expertise are integral for knowledge production, especially for public-facing research (Collins & Evans, 2002). Second, detailing the methodological concerns for exploratory tasks and problem definitions of computational approaches is increasingly relevant for computer science and related fields. Measuring and achieving machine explainability still depends on human interpretability and speculation about a machine’s capabilities (Prescott, 2023). Indeed, as Campolo and Crawford (2020) note, “interpretability is not simply a property of any model or technique but rather only emerges with deep contextual knowledge of the social structures and even histories where they are applied” (p. 14). Thirdly, qualitative evaluations of exploratory computation probe into the abstraction of contexts and labor relations that support computational research. As part of the digital humanities repertoire, exploratory computation can inform postcolonial computing that recognizes how interpretation is already political, shaped by individuals and their sociocultural contexts (Irani et al., 2010). In proposing new workflows of methodological transparency, our framework contributes towards “politically, ethically, and social justice-minded approaches to digital knowledge production” (Risam, 2019, p. 4) advocating for more reflexive research design and accountability.

Our proposed research framework has practical implications for reproducibility in digital humanities computational research. It shifts from technical notions of reproducibility to transparency and accountability in the research process. The framework outlines the conditions that shape interpretive work in research design and levels of analysis (Joyeux-Prunel, 2024). Importantly, it complements benchmarking by enabling reverse engineering approaches of the technical procedures to identify interpretive interventions often absent from model and dataset documentation. The framework makes tacit interpretive practices more visible and supports internal and peer-review norms in digital humanities (Dobson, 2021), offering journals and researchers an instrument for enhancing clarity and legibility of research procedures. This framework could be formally incorporated into the range of digital humanities scholarly communication–from pedagogy, project websites, and research publications that evaluate projects and methodology, such as “Reviews in DH” and “Digital Humanities Quarterly”, to emerging formats of data papers such as “Journal of Open Digital Humanities Data” and “Responsible Datasets in Context”. Currently, the framework is limited to process-based reflections on singular methodologies with the aim of iteration. A scoping review of similar exploratory computational projects and measurement of their research design against defined tasks could yield meaningful insight through systematic comparison. Overall, this discussion paper presents a qualitative evaluation framework for evaluating exploratory computation that foregrounds the often invisible interpretive conditions integral for humanistic research.

Acknowledgements

We would like to thank the guest editors and reviewers for their constructive comments. We thank the research team, Kailiang Fu, Tyler Gurth, and David H. Laidlaw for their extended research project, which served as an applied case study for the design of our framework. We would also like to thank the organizers and participants in the Summer Institute in Computational Social Science UCLA 2025 as well as students enrolled in Digital Humanities 201, Information Studies 298B, and Digital Humanities 112 at UCLA Spring and Fall 2025 for their engagement on project management, research design, and data visualization that inspired this discussion paper.

Competing Interests

The authors have no competing interests to declare.

Author Contributions

Cindy Anh Nguyen: Conceptualization, Visualization, Writing – Original Draft, Review and Editing; Alejandro Alvarado Rojas: Conceptualization, Visualization, Writing – Original Draft, Review and Editing.

Exploratory Computation in Digital Humanities: A Qualitative Evaluation Framework

Full Article

(1) Introduction

(1.1) Benchmarking in Digital Humanities

(2) Reframing Benchmarking for Exploratory Computational Inquiry

(2.1) Interpretation, Computation, and Digital Humanities

(2.2) Exploratory Computation as Digital Humanistic Inquiry

Figure 1

Table 1

(3) Proposed Qualitative Framework to Evaluate Effectiveness of a Computational Method for Exploratory Tasks

Figure 2

(3.1) Positionality

Figure 3

Figure 4

(3.2) Task Evaluation

Figure 5

Figure 6

(3.3) Retrospection

Figure 7

Figure 8

(4) An Applied Case Study of the Evaluation Framework for Digital Humanities Exploratory Computation

(5) Discussion and Conclusion

Acknowledgements

Competing Interests

Author Contributions

Paradigm

My account