Have a personal or library account? Click to login
DALA, The Database of African American and Predominantly White American Literature Anthologies Cover

DALA, The Database of African American and Predominantly White American Literature Anthologies

By: Amy E. Earhart  
Open Access
|Apr 2025

Full Article

(1) Overview

Repository location

DALA https://dataverse.tdl.org/dataverse/DALA;

Authors https://doi.org/10.18738/T8/AGOUMZ;

Anthologies https://doi.org/10.18738/T8/8XAV7E;

Authors in Anthologies https://doi.org/10.18738/T8/USRPJ8;

Editors https://doi.org/10.18738/T8/JWFPES;

Editors of Anthologies https://doi.org/10.18738/T8/VMLUCB;

and a data dictionary https://doi.org/10.18738/T8/GTR7TG.

Observable Plot code:

Authors, Visualization in Observable Plot https://doi.org/10.18738/T8/NABIPU; Anthologies, Visualization in Observable Plot https://doi.org/10.18738/T8/ANDBOA.

Context

DALA, The Database of African American and Predominantly White American Literature Anthologies was created to test how we categorize identities in a database and to analyse the literary canon of both. The dataset creation informs the monograph “Digital Literary Redlining: African American Anthologies, Digital Humanities, and the Canon” (Earhart, 2025). It is a highly curated small data project that includes 267 individual anthology volumes, 107 editions, 319 editors, 2,844 unique individual authors, and 22,392 individual entries, and allows the user to track the shifting inclusion and exclusion of authors over more than a hundred-year period (Figure 1).

johd-11-298-g1.png
Figure 1

Anthologies published by year, page count.

The anthology is an important object of study as it has produced the intellectual framework of the study of literature and delineated a canon for use in the classroom. Digital anthology study is a growing field, with Levy and Perry (2015), Enszer (2016), Kenton and Howard Rambsy (2020) and Fredner and Porter (2024) engaging in the modeling of such data. DALA is a groundbreaking dataset that differs from existing datasets, such as The Black Short Story Dataset, which is focused on African American literature short stories, or the Norton Anthology of American Literature dataset, which is focused only on the Norton.1 DALA is larger than these datasets and its broader encoding of identity adds to the growing body of work focused on anthologies and asks that when we study identity with digital technologies that we think carefully about the ways that digital interventions shape and reshape concepts of race and gender.

(2) Method

DALA collects information about American and African American literature anthology production including editor, author, and author inclusion data.

Steps

In 2014 I began to identify, collect and examine anthologies, hand entering data. There are existing bibliographies and datasets of anthologies, such as Joseph Csicilla’s bibliography of American literature anthologies (2004) or The Black Short Story Dataset (Rambsy et al., 2018), that provided my initial lists of anthologies. I expanded such lists through searches in WorldCat (OCLC), Google Books, and the MLA International Bibliography as I was interested in a longer time period and broader set of genres than existing bibliographies and datasets. With the aid of technologists and database designers, visualizations were produced from the data. While I led the project and completed the bulk of the data entry over an eight-year period, undergraduate and graduate students also entered data. All individuals who participated are denoted by individual entry within the dataset.

DALA represents an alternative approach to modeling identity within datasets. Digital humanities projects focused on United States subjects often use US Census identity categories for ease and interoperability purposes, but these categories are limited and do not align with contemporary theoretical models of identity. For example, the most recent Census asked if individuals were female/male, decidedly limited identity categories that represent sex, not gender. The Census has also served as a mechanism to constrain and monitor racialized bodies, for instance through the identification of those marked as “Chinese” in the 1870 census as preparation for the 1882 Chinese exclusion act (Nobles, 2000; Mezey, 2003; Hanna et al., 2020). For digital humanists, it is crucial that we think about how we borrow or reproduce such categorizations.

Rejecting Census categories, I adopted gender and identity categories that more closely align with contemporary theoretical understandings. I only identified authors and editors as non-binary if they publicly articulated a non-binary identity. The data dictionary articulates the categorization strategies, directly tying the theoretical to the data itself (Figure 2).

johd-11-298-g2.png
Figure 2

Data Dictionary, Instructions for data entry.

As Tara McPherson has argued, the best digital database projects “…pay attention to specific things and experiences, resisting the decontextualizing logic of the database” (2015, 495). It is this resistance to static categories that DALA hopes to model.

Data was collected in Google sheets and includes Authors, Anthologies, Editors, Editors of Anthologies and Authors in Anthologies. When entered we regularized identity by preferred contemporary terms to avoid repeating the violences of certain colonialist terms of identity. So, if an anthology editor writes a biography header that states that the text was written by an “Eskimo,” a term that Indigenous peoples reject as derogatory, we instead encode authorship as Inuit, the word preferred by the people it describes.

DALA calculates the space each author is given within an anthology in order to show the shifts in canonicity. Digital anthology studies are split between simple counting versus percentage-based calculations. I have adopted the percentage methodology in which I calculate the percentage of pages that each author is allotted which makes comparisons across anthologies far more accurate. I call my use of this calculation Digital Literary Redlining (DLR), a term that mimicks methods for the analysis of redlining, a systemic denial of home loans based on race. Digital redlining as a term is a recent invention, with the concept most often utilized to reflect disparities in technological access (Hall, 2021; McCall et al., 2022; Parks, 2021; Diep, 2022; Bessette, 2023). As scholars of the canon wars made clear, American literary anthologies long purported to represent the American literary tradition as unbiased, based on traits such as literary greatness or aesthetics, which they claimed to be universal, much as the banking and insurance industries maintained that redlining of property was based on unbiased understandings of property values. Yet these supposedly neutral criteria hid biases and exclusions. Our transition to digital data shares a similar supposedly neutral methodology and the bias, in Ruha Benjamin’s term, is a “new Jim Code” (2019, 5). The mask of neutrality hides power differentials in knowledge infrastructures like the anthology or the database, which is why redlining is a useful theoretical model to engage the power structures embodied within such mediums.

Data was visualized with the open-source tool Observable Plot with one visualization set focused on anthologies <https://observablehq.com/d/56f95000aad79eb6> and the other on authors <https://observablehq.com/d/7774c0b27f80ef63>. I preferenced this data visualization tool as it is free and open source. The visualization allows users to select from variables to view author and editor identity and anthology composition over time, allowing deep dives into anthology selections and their representation of literary canon. A researcher might track an author’s reputation over time or examine the impact of an editor on an anthology. Trends in anthology production come into view through the tool and offer researchers new ways to understand literary history.

Sampling strategy

The dataset is limited to generalist anthologies in each field and is bound by nation, so transnational anthologies are excluded. I also exclude shorter or concise editions, focusing on full length anthologies to ensure that I am comparing analogous anthologies as well as anthologies that are specialized by period, gender, or genre.

Quality control

Data is regularized as described in the data dictionary. Categories were built into drop down menus to ensure consistency across the dataset. VIAF: The Virtual International Authority File was used to normalize the authors and editors. If VIAF was not available, we utilized Wikipedia.

(3) Dataset Description

Repository name

Texas Data Repository

Object name

DALA, The Database of African American and Predominantly White American Literature Anthologies

Format names and versions

.csv and .tgz

Creation dates

(2014-09-01–2024-02-01).

Dataset creators

Bauer, Jean. Formal analysis, Software Visualization. Eggs and Lemon.

Earhart, Amy E. Conceptualization, Funding Acquisition, Investigation, Methodology, Project Administration, Writing. Texas A&M University.

Karami, Atoosa. Data entry. Texas A&M University.

Madson, Mykala. Data entry. Texas A&M University.

Miller, Katelyn. Data entry. Texas A&M University.

Parkes, Nicholas. Formal analysis, Software, Visualization. Texas A&M University.

Performant Software. Formal analysis, Software, Visualization.

Seacrest, Donnie. Data entry. Texas A&M University.

Shutz, Sally. Data entry. Texas A&M University.

Trent, Clare. Data entry. Texas A&M University.

Language

English

License

CC0 1.0

Publication date

(2024-11-24).

(4) Reuse Potential

The DALA data might be reused by literary scholars, race and gender scholars, and cultural historians. There are a growing number of scholars who investigate anthologies with digital approaches and their datasets, discussed earlier, may be cross walked with DALA. Scholars might also expand the categories that have been adopted to model best practices of denoting identity.

Notes

[1] The Black Short Story Dataset (Rambsy et al 2018) is available for use at the Texas Data Repository, but the dataset from the Fredner and Porter project (2024) is unavailable.

Competing Interests

The author has no competing interests to declare.

Author Contributions

Earhart: Conceptualization, Funding Acquisition, Investigation, Methodology, Project Administration.

DOI: https://doi.org/10.5334/johd.298 | Journal eISSN: 2059-481X
Language: English
Submitted on: Dec 20, 2024
Accepted on: Feb 24, 2025
Published on: Apr 9, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Amy E. Earhart, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.