1. Overview
Repository location
Context
This dataset was created from a cross-institutional collaboration among research centers in the visual arts, each maintaining archives of past fellows and award recipients: 1. The Center for Advanced Study in the Visual Arts, National Gallery of Art (The Center) in Washington, DC, 2. the Clark Art Institute in Williamstown, MA (the Clark), 3. the Getty Research Institute in Los Angeles, CA (Getty), and 4. the Smithsonian American Art Museum (SAAM) in Washington, DC. While records of past fellows have historically been maintained by each institution, this project integrates them into a single structured dataset to facilitate comparative studies across research centers and across time.
The project aligns with broader initiatives in the digital humanities to make research infrastructure and data open and interconnected. The compiled dataset expands the accessibility and reuse potential of institutional fellowship data at a time when funding opportunities have become significantly constrained, and many longstanding public sources that provide longitudinal data about education and scholarly funding have been lost.1
2. Method
Steps
The Scholars Data Project was started in 2023 when four major research centers (the Center, the Clark, Getty, SAAM) dedicated to the study of art history and visual culture determined that a cross-institutional approach to historic fellowship data would be valuable. Standard rubrics for data collection and reconciliation were key to effective collaboration. The Association of Research Institutes in Art History (ARIAH),2 a long-standing consortium, encouraged the project, with member institutions providing valuable feedback on its development and the website hosting some of its findings.
Data were collected from each of the four participating institutions, which sustained their own independent and separate information. These original rosters were inconsistent in format, generally oriented around divergent data points and often made available on institutional websites or in printed reports. Each institution followed a different process for data collection. Some modified datasets that were extracted from grants databases and others transcribed information from websites or print publications to spreadsheets, or mixed various methods.
While each institution possesses a range of information about former fellowship awardees and appointees, representatives of the four institutions agreed only to use information that was published previously in digital formats, with the earliest records starting in 1961 for the Center. Additionally, they agreed to privilege both FAIR3 (Linacre, 2022) and LOUD4 data principles, oriented around a limited set of data points and hinging on the use of persistent identifiers, in addition to two linked tables: Awards and Affiliations.
The participating institutions possess long histories of work in the Linked Data sphere, including Wikidata and the wider Wikimedia ecosystem (Foster, Norling, and Westerby, 2025; SAAM, 2017; Zweig, 2022).5 However, most of these projects have been oriented around art historical research and collections infrastructure rather than institutional assessment. The Scholars Data Project represents an effort to extend FAIR and LOUD principles beyond access to collections and research databases to lay the ground for collective disciplinary reflection on the state of the field. As such, it uses Wikidata as a linking hub for datasets, a practice increasingly adopted in humanities projects (Zhao, 2023, 4.1.2).
Data reconciliation was conducted with OpenRefine,6 where Wikidata Q identifiers (QIDs)7 were used to link institutional names to external authority data. When institutions did not have corresponding Wikidata QIDs, new QIDs in Wikidata were generated for those institutions (with basic information about the institution’s purview, city, country, and geolocation) and added to the Affiliations dataset. Research Organization Registry (ROR) identifiers8 were also pulled from Wikidata using OpenRefine’s reconciliation service to enhance interoperability and reuse. Where a scholar was unaffiliated, a city of residence was provided. Each record in the larger combined dataset represents an individual scholar and the granting program, year, and stated institutional affiliation (or city of residence for independent scholars). Titles and subjects of scholars’ research projects are not represented in this dataset.9 Persistent identifiers (such as Wikidata QIDs or ORCIDs) for individuals who received awards or appointments are not covered in this dataset; the authors take the position that living scholars should create and maintain such URIs themselves.10 It should be noted that ORCID uptake among humanities researchers, especially those based in the United States, remains low, though some institutions and granting organizations in the humanities are now requiring applicants to provide an ORCID with their submission materials (Shulman, 2025).
All fellowship years were standardized to the academic calendar (e.g., “1995” representing 1995–96). Multi-year fellowships were flagged with a Boolean multi_year_award field, which applies only to awards for the Center. Multi-year awards at Getty (which are not currently being offered) were recorded as separate, consecutive one-year terms.
Updates to the Awards and Affiliations tables will be made periodically, using the latest scholar data provided by participating institutions, and reconciled with existing datasets using the methods outlined above. Future edits to the Awards and Affiliations tables will be tracked with GitHub commits and released as new versions to a Zenodo open repository.
Quality control
The Affiliations table was developed using Airtable, and the Dedupe extension was used to check for duplicate records. Institutions represented in non-Roman characters are a challenge to deduplicate and errors may remain in the dataset. The Affiliations table is an active database of institution names following consistent standards (discussed further below). Associating each Awards record with a corresponding Wikidata QID allows for interoperability with the Affiliations table and ensures consistency across institution names and locations for both tables.
3. Dataset Description
Repository name
Zenodo, from GitHub.
Object name
Cross-program-awards-affiliations
The dataset includes two tables:
Awards (awards-2025-05-10; 3,877 records): program, year, last_name, first_name, affiliation_QID, subprogram, multi_year_award, fellowship_title.
Affiliations (cross-program-awards-affiliations; 1,100 records): institution, type, affiliation_QID, affiliation_alt_name, affiliation_ROR_ID, city, state_province, country, coordinate-location, lat, long.
Together, these tables enable linking individuals with institutional and geographic entities using Wikidata QIDs as the key join (Figure 1). Two examples of Award records (Tables 1 and 2) and the associated Affiliation record (Table 3) illustrate the data structure. A README file describes the tables and variables in detail.

Figure 1
Relationship between tables viewed with Airtable base schema extension.
Table 1
Example Award record for Barbara Abou-El-Haj (1943–2015).
| PROGRAM | YEAR | LAST_NAME | FIRST_NAME | AFFILIATION_QID | SUBPROGRAM | MULTI_YEAR_AWARD | FELLOWSHIP_TITLE |
|---|---|---|---|---|---|---|---|
| Center | 1999 | Abou-El-Haj | Barbara | Q863813 | Senior Fellow |
Table 2
Subsequent example Award record for Barbara Abou-El-Haj (1943–2015).
| PROGRAM | YEAR | LAST_NAME | FIRST_NAME | AFFILIATION_QID | SUBPROGRAM | MULTI_YEAR_AWARD | FELLOWSHIP_TITLE |
|---|---|---|---|---|---|---|---|
| Clark | 2003 | Abou-El-Haj | Barbara | Q863813 | NA |
Table 3
Example Affiliation record for Binghamton University.
| AFFILIATION | TYPE | AFFILIATION_QID | AFFILIATION_ROR_ID | AFFILIATION_ALT_NAME | CITY | STATE_PROVINCE | COUNTRY | COORDINATE_LOCATION | LAT | LONG |
|---|---|---|---|---|---|---|---|---|---|---|
| Binghamton University, State University of New York | Institution | Q863813 | https://ror.org/008rmbt77 | Binghamton | NY | USA | 42.08925, –75.96989 | 42.08925 | –75.96989 |
Format names and versions
CSV (UTF-8); documentation in Markdown.
Creation dates
2024-02-03 to 2025-07-30.
Dataset creators
Eliza Dermott (The Clark), Caroline Fowler (The Clark), Lidia Ferrara (Getty Research Institute/UCLA), Amelia Goerlitz (SAAM), Jen Rokoski (The Center), Reed Silverstein (SAAM), Nancy Um (Getty Research Institute), Matthew Westerby (The Center), Caitlin Woolsey (The Clark).
Language
English and other languages, including transliterated names of institutions from VIAF.
License
CC0 1.0.
Publication date
2025-07-30.
4. Reuse Potential
The scholarly programs in art history and related fields that fund and host residential scholars constitute a diverse group, although most are associated with libraries, archives, museums, or research centers. While there are many opportunities for cross-institutional connection and information exchange, especially through consortium organizations and working groups, such as ARIAH, IRLA (Independent Research Library Association),11 and the Director’s Forum (a group that was founded during COVID and is convened by The Huntington), there has never been a coordinated effort to aggregate historic funding data across granting institutions, making it difficult to assess the collective impact of these resources. In fact, many institutions lack standards for consolidating and analyzing their own historic data, even internally. Some engage in record-keeping practices that are unsustainable, such as using a public-facing institutional website for data preservation.
The Scholars Data Project represents an effort to aggregate statistics for residential research funding across institutions. In order to expand best practices in data management, it provides open tools and templates that institutions, beyond the original four, can adopt to participate. At the centerpiece of these offerings is the Affiliations table, with over 1,000 institutions and sites that are relevant to research in the history of art, each furnished with a Wikidata identifier (QID) and related geographic information (latitude and longitude coordinates). In certain cases, an alternate spelling and the Research Organization Registry (ROR) ID is also provided.12 In this way, the Affiliations dataset allows for institutions to organize, compile, and report their own funding data according to a standard set of points. Additionally, the Affiliations dataset, oriented around the QID as the key value, offers a pathway to meaningfully aggregate data across granting institutions, minimizing the cleaning, disambiguation, and reconciliation that would usually be needed to bring together diverse data collected under varying conditions. This approach follows a broader trend in public humanities research to publish and build on Wikidata items for small to medium datasets focused on practical reuse with relatively low technical debt.
As such, the Affiliations dataset serves as a crucial instrument of cooperation between granting institutions, with the ultimate hope that it can assist in shedding light on past funding patterns in the field of art history, even while it is not comprehensive. This goal has become more urgent due to the recent decline in federal funding opportunities, which has caused ripple effects across institutions, especially those that have relied on NEH grants to support scholar programs. It is also clear that available data on federal funding may not be maintained or published in the future. So, granting institutions may need to play a larger role in sustaining a public record of funding, during a moment when the value of this data has increased considerably.
There are also many possibilities for reusing the Awards and Affiliations datasets to understand the changing shape of art history and to evaluate the health of the field. For instance, this data could be joined with other bodies of data, such as the American Academy of Arts and Sciences Humanities Indicators,13 which examines higher education and funding data, parsed by state and year. The data could also be used in combination with “Art History Dissertations and Abstracts from North American Institutions,” a dataset which tracks the PhDs that have been completed in North America across institutions, drawn from the College Art Association’s running roster, published by Catherine D. Adams and Carolyn Lucarelli, of Penn State (Adams & Lucarelli, n.d.). By joining fellowship data with dissertation completion data, it would be possible, for example, to assess the trajectories for recent art history graduates, drawing lines between university degrees and post-graduate appointments at research institutes.
The datasets also hold value for individual art history departments that aim to assess the overall funding success of their faculty and students over time, while allowing for comparison across institutions. As Figure 2 indicates, large art history departments that have long offered doctoral degrees have received the largest number of awards over the past six decades, such as Harvard (131 in total) or Yale (124 in total). However, smaller departments that were established later reflect upward trajectories in their funding patterns, even based on partial funding data for the 2020s. As an example, Emory has garnered 22 awards since the 1970s, with seven of them granted in the 2020s. University of California, Santa Barbara has garnered a total of 35 awards since the 1970s; 21 of them were awarded since 2010.

Figure 2
Comparison Slide of Selected Institutional Funding by Decade. This chart compares the histories of funding for eight institutions, highlighting the most awarded institutions (top row) and those that have lower overall numbers of scholars and shorter histories of funding, but with overall upward trajectories (bottom), based on only partial reporting for the 2020s.
These are just a few examples of the larger questions that may be posed regarding the future of humanities research and funding, as they relate to issues of scale, institutional standing, and the evaluation of research impact. Further insights are provided in a recent post about the project and an interactive visualization of the data, which can be found on the ARIAH website.14
We realize that other institutions may not publish lists of past fellows, and that affiliation data represents point-in-time data, rather than long-term institutional ties, particularly for graduate students. Scholars unaffiliated with an institution at the time of award are represented by their city of residence, which may not correspond to other published information. Finally, ROR identifiers are incomplete for smaller or non-academic organizations. Nonetheless, the dataset provides a strong, extensible foundation for future work and collaboration across humanities data, specifically for art history research institutes.
Notes
[1] For more information about data loss, see, for instance, Data Rescue Project. Retrieved December 1, 2025, from https://www.datarescueproject.org.
[2] ARIAH Association of Research Institutes in Art History. (n.d.). Retrieved October 20, 2025, from https://www.ariah.info/.
[3] WikiFAIR. Retrieved November 25, 2025, from https://meta.wikimedia.org/wiki/WikiFAIR.
[4] Linked Art, LOUD. Retrieved December 1, 2025, from https://linked.art/loud/.
[5] Getty Open Data and APIs. Retrieved December 1, 2025, from https://www.getty.edu/projects/open-data-apis/; Getty Provenance Index. Retrieved December 1, 2025, from https://www.getty.edu/research/provenance/; Getty Vocabularies as LOD. Retrieved December 1, 2025, from https://www.getty.edu/research/tools/vocabularies/lod/index.html.
[6] OpenRefine. Retrieved December 1, 2025, from https://openrefine.org/.
[7] Wikidata Q identifier. Retrieved December 1, 2025, from https://www.wikidata.org/wiki/Q43649390.
[8] Research Organization Registry (ROR). Retrieved December 1, 2025, from https://ror.org/.
[9] By comparison, the Center formerly published a running compendium titled Sponsored Research in the History of Art (Sherman, 1980–1994). This source was not utilized to create the dataset described here as it lacks representative data for recent decades and would require extensive processing. SAAM also published a running compendium of fellowship recipients and project titles (SAAM, 2020).
[10] Names of individuals are separated as two values in this dataset. Without a persistent identifier for each scholar, such as ORCID, this may be a limitation for certain types of analysis, such as creating labels for visualizations. The two fields may need to be combined with further processing.
[11] Independent Research Libraries Association. (n.d.). Retrieved October 20, 2025, from https://irla.lindahall.org/.
[12] Research Organization Registry (ROR). (n.d.). Retrieved October 20, 2025, from https://ror.org/.
[13] Funding and Research | American Academy of Arts and Sciences. (n.d.). Retrieved October 20, 2025, from https://www.amacad.org/humanities-indicators/funding-and-research.
[14] Nancy Um and Matthew Westerby, “Taking The Long View: Gauging The Impact Of Residential Fellowships In Art History Over The Decades,” HistPhil, September 8, 2025. Retrieved December 1, 2025, from https://histphil.org/2025/09/08/taking-the-long-view-gauging-the-impact-of-residential-fellowships-in-art-history-over-the-decades/; Scholars Data Project | ARIAH Association of Research Institutes in Art History (n.d.). Retrieved October 21, 2025, from https://www.ariah.info/news-opportunities/scholars-data.
Acknowledgements
The authors would like to thank ARIAH and its member institutions that enthusiastically supported this project, provided feedback on its initial findings, and have generously featured visualizations for the Scholars Data Project as a persistent offering on its website.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
Matthew J. Westerby: Conceptualization; Data curation; Methodology; Visualization; Project administration; Writing – original draft.
Lidia Ferrara: Data curation; Methodology; Writing – original draft; Writing – review and editing.
Nancy Um: Conceptualization; Data curation; Methodology; Visualization; Project administration; Writing – original draft; Writing – review and editing.
