Have a personal or library account? Click to login
Towards a Linked Open Index: Reconciling Museum Records to Wikidata for Index of American Design Constituents Cover

Towards a Linked Open Index: Reconciling Museum Records to Wikidata for Index of American Design Constituents

Open Access
|Jan 2026

Full Article

1. Overview

1.1 Introduction

The Index of American Design (IAD) is a unique collection of watercolor drawings and associated archives created for the Federal Art Project (FAP) in the late 1930s through the early 1940s. As a work relief program that employed around 500 people at any one time across the United States, the IAD encompassed 37 regional offices in 34 states and the District of Columbia. Over 18,000 watercolors were deposited at the National Gallery of Art in 1943, along with over 34,000 photographs, notes, administrative documents, and other materials. Outputs of the FAP belonged to the US government, and as such IAD watercolors and archives are in the public domain.

IAD artists produced meticulous watercolor renderings of items sourced from private homes, museums, churches, and antique stores (Figures 1 and 2). Information about each object was recorded on a Data Report Sheet, as depicted in the promotional photograph series “Garret to Gallery” (Figure 3). Conceived as both a record of historical context and as a “great database of images that could inspire the creations of artists or designers” (Wells, 2025, p. 156), the IAD offers a look back at information technologies and ongoing opportunities to model and connect open data for museums, archives, and libraries.

johd-12-426-g1.jpg
Figure 1

Archival photograph of IAD artist Nicholas Amantea (Wikidata Q56033425) rendering a jug (IAD classification number NYC-Cer-St-269cd), captioned “an early American stoneware jug poses for its portrait” in the series Garret to Gallery. Index of American Design Records, Courtesy of National Gallery of Art Archives.

johd-12-426-g2.jpg
Figure 2

Nicholas Amantea, Water or Cider Jug, c. 1939. Watercolor and graphite on paperboard. National Gallery of Art, Index of American Design, 1943.8.6382 (Wikidata Q64618255).

johd-12-426-g3.jpg
Figure 3

Archival photograph of an IAD employee, captioned “the plate is classified and filed with other plates” in the series Garret to Gallery. Index of American Design Records, Courtesy of National Gallery of Art Archives.

1.2 Context

Over 1,000 artists who created artwork for the IAD are represented as “constituents” in the National Gallery of Art’s Collections Management System (CMS), along with records of many more who participated as administrators and owners of objects, among other roles. Decades of research, cataloging, and care for these collections have produced a wealth of knowledge, yet in many cases only limited information was available about individuals and institutions involved with the IAD project before it ended in 1942.

Starting in 2018, images of IAD watercolors and associated metadata were shared on Wikimedia Commons and Wikidata as part of a Wikimedia project (Zweig, 2022). The goal of enriching datasets produced by museums and archives, connecting them to others, and opening data creation to researchers outside the walls of institutions parallels the objectives of the American Art Collaborative Linked Open Data initiative (SAAM, 2017). Although the National Gallery of Art has not yet shared data about our “constituent” IAD artists and object owners to Wikidata, we endeavored in this project to reconcile the names we do have with Wikidata items created by others as a necessary preliminary step in this direction.

Reconciling our IAD data to Wikidata as a step toward sharing our records outward in the future is also a reasonable extension of the goals of the FAP to create public domain resources for the use of artists and designers without restriction. Images of IAD artworks on the National Gallery’s website and those donated to Wikimedia Commons fall under the open access policy for works in the public domain. As such, users may download and reproduce any digital image of IAD artworks without seeking authorization. Digital images are released under Creative Commons Zero (CC0).1 Digital images and PDFs created from IAD Data Report Sheets, maintained by Gallery Archives, are also public domain.

1.3 Motivation

Using OpenRefine and Wikidata, this project sought to reconcile known constituents with linked data entities to tell more comprehensive stories about the thousands of people who made the IAD and how they carried out their various roles on the project. This process also allowed us to identify discrepancies between objects and constituents data managed in our CMS with Wikidata.

The vast majority of IAD artists are not represented in the Union List of Artist Names (ULAN), and early in our work we discussed the potential benefits of contributing new records to ULAN. Recognizing the value of the tens of thousands of Wikidata items for IAD objects already contributed (Zweig, 2022), we decided to focus on Wikidata for our reconciliation efforts. This also allowed us to use Wikidata items created by other museums, like the Museum of Modern Art (Lill, 2024).

Our Wikidata reconciliation and visualization work originated as part of an emerging technology pilot at the National Gallery of Art, focused on AI-assisted extraction of data and graph exploration (AI-EDGE). The pilot set out to connect the web of IAD content to enable deeper discovery and exploration of diverse stories about artists, locations, and design history. One experiment in this project processed IAD data sheets with a custom Azure Cognitive Services model. Extracted text was corrected after extraction with LLM cleanup. Although AI was not used directly in the reconciliation and visualization efforts described in this paper, we offer some future directions in the closing section, in addition to considerations for contributing CMS data to Wikidata, which was not within the scope of our pilot.

To measure and visualize the gaps or differences between our objects and constituent data and the corresponding Wikidata items, we created a Power BI dashboard. Designed primarily for National Gallery of Art stakeholders and staff, these dynamic visualizations help us to better understand and evaluate how our data is represented on Wikidata and how that information is being viewed, used, and enhanced by others. They also allow us to analyze the impact of cultural heritage work with Wikidata more broadly and to prepare future phases of work towards “roundtripping.”

2. Method

2.1 Source Data Scoping and Preparation

To conduct a comparative analysis of Index of American Design (IAD)-related constituent records in the National Gallery of Art’s CMS Constituents module against their Wikidata counterparts (where they exist or could be identified), the project team first developed a mapping of available CMS fields to Wikidata properties most commonly associated with instances of human (Q5) or various organization-level items, such as organization (Q43229) or corporate body (Q106668099). Because the use and coverage of Wikidata properties varies even among items which are instances of the same type, our goal was to identify at least one corresponding Wikidata property for each CMS constituent field. The resulting field-to-property alignment, along with the constituent types most likely to include each field, is summarized in Table 1.

Table 1

Mapping of fields from the National Gallery of Art’s CMS Constituents module to corresponding Wikidata properties, indicating the constituent types most likely to have each field populated. Note that “All” indicates that the CMS field may be populated for both individual and corporate body constituent records.

CMS CONSTITUENT FIELDCONSTITUENT TYPEWIKIDATA PROPERTY
Constituent IDAllNational Gallery of Art artist ID (P2252)
Display NameAllEnglish label
First NameIndividualgiven name (P735)
Last NameIndividualfamily name (P734)
Institution NameCorporate BodyEnglish label
NationalityIndividualcountry of citizenship (P27) May also be contained in description string,
Begin Date (Year)Individualdate of birth (P569)
End Date (Year)Individualdate of death (P570)
Begin Date (Year)Corporate Bodyinception (P571)
End Date (Year)Corporate Bodydissolved, abolished, or demolished date (P576)
Birth PlaceIndividualplace of birth (P19)
Death PlaceIndividualplace of death (P20)
GenderIndividualsex or gender (P21)
EthnicityIndividualethnic group (P172)
Sexual OrientationIndividualsexual orientation (P91)
ULAN IDAllUnion List of Artist Names ID (P245)

With the field mapping established, the next step was to build the comparative datasets. Extracting IAD-related constituent data from the National Gallery’s CMS was relatively straightforward. While much of the scoped data was already available through the National Gallery’s Open Data Program repository on GitHub,2 several fields in scope for this analysis—specifically Gender, Ethnicity, and Sexual Orientation—were omitted from the public dataset due to the incompleteness of available data in the CMS and need for additional research and cataloging on these properties. To access this information, the project team queried the source CMS database directly to retrieve all necessary fields from the Constituents module. This approach also allowed us to filter results to include only those constituent records linked to at least one IAD object record through roles such as Creator, Source, IAD Object Maker, or IAD Object Owner.

Building the Wikidata dataset presented a greater challenge for the project team. Because the National Gallery’s earlier Wikimedia project did not include constituent records, the CMS and Wikidata datasets remained independent, ensuring that the comparative analysis was free from circular provenance. At the same time, this independence meant there was no straightforward way to identify all relevant person or corporate body records in Wikidata. Most IAD object entries had not yet been reconciled to their creators through the creator (P170) property, so there was no clear query pattern that could be applied through the Wikidata Query Service to retrieve a consistent subset of potential matches for comparison with CMS constituent data.3 As noted in the Overview, many of these individuals were lesser-known artists or private owners whose relationships to IAD objects were not well documented, further complicating efforts to locate and align their records. The lack of identifiers for IAD artists is also a manifestation of the Matthew effect, where the most richly described and documented topics continue to grow while lesser known or understudied topics remain thinly described. Wikidata presents an opportunity to counterbalance this scenario by lowering the bar to creating LOD identifiers, bringing visibility to the thousands of people involved in the IAD who may have contributed to an important mass cultural project but who have not been assigned identifiers in other services like ULAN.

For these reasons, the project team decided to create the Wikidata dataset via a partially automated but fully human-reviewed reconciliation process. To accomplish this, we loaded the constituents dataset extracted from the CMS into OpenRefine and used its built-in Wikidata reconciliation service to identify the relevant Wikidata items and pull in key properties about those items (see Figure 4 for full workflow diagram).4 These Wikidata fields, combined with the records from the CMS, would form our comparative dataset for analysis and visualizations.

johd-12-426-g4.png
Figure 4

Workflow diagram of data extraction, reconciliation, enrichment, and visualization process.

2.2 Openrefine Reconciliation of Constituents to Wikidata

The IAD constituents fall into a few different type categories designated in the CMS: individuals, couples, corporate bodies, or anonymous. Additionally, these constituents are associated with IAD object records via one or more designated roles, such as artist, current owner, source, donor, previous owner, and IAD object maker (the craftsperson or manufacturer who produced the object that was then rendered by the IAD artist). In this project, our priorities for reconciliation were constituents of type individual or corporate body that were associated with IAD objects via the artist or IAD object maker roles. This narrowed down our Wikidata reconciliation efforts to 3,141 (52.66%) of the 5,965 constituents associated with IAD objects in the CMS.

Along with the designation of type (individual/corporate body) and role (artist/IAD object maker), the National Gallery’s CMS captures a number of fields containing biographical information about constituents (see Table 1 for a list of these fields). When using the Wikidata reconciliation service in OpenRefine the reconciliation is run against a single column in the source dataset, and the service will look for a match or suggested matches based on similarity to Wikidata items’ preferred label. For our constituents dataset, we ran the reconciliation service against our constituent Display Name column. To help strengthen the certainty of the reconciliation process, other fields (e.g., life dates, gender, occupation, etc.) in the source dataset can be mapped to Wikidata properties when configuring the service to run. The resulting suggested matches will then be limited to Wikidata items that have those same or similar values between the fields and mapped properties, vastly increasing the confidence (and speed) of the process.

After weighing the costs and benefits of mapping additional constituent data points to Wikidata properties when configuring the reconciliation run, the project team ultimately did not have confidence that this information would be present in both datasets across all constituents, which could limit the number of likely matches output by the service. For example, when reconciling individual constituents who were involved in the IAD project as artists, most would not be listed as “artists” in Wikidata, as many did not go on to have careers as artists after the project ended. In this scenario, mapping the role column in the source constituents dataset to the Wikidata occupation property (P106) could lead to missed matches. For most constituents, the CMS did not have information about nationality, date of birth/death, or place of birth/death. Because of this limitation, the reconciliation required more manual review than it would have if we were working with a set of constituents for which there were more known data points. Over several weeks of reconciliation work, the project team was able to reconcile 528 (8.85%) out of the 5,965 IAD constituents with Wikidata items (Figure 5).

johd-12-426-g5.png
Figure 5

A screenshot from the interactive Power BI dashboard that summarizes Index of American Design representation (both constituents and images) on Wikidata.

The reconciliation process was performed primarily by one member of the team, but we collectively decided that matches would only be made if they were certain, and we would not match anything that felt tenuous or even probable. That level of certainty typically required that the bibliographic information from the CMS and properties from the Wikidata item aligned. If key details between these data sources did not match, or weren’t present to provide context, the two records would not be reconciled. This decision meant that it wasn’t necessary for the reviewer to be an expert in the field, nor was it necessary to set up a review process for quality control. If the matches we were making were based on less concrete data points, or if the project teams had plans to import the Wikidata QIDs back into the CMS, these strategies would have been necessary. Additionally, the OpenRefine reconciliation process provides a “confidence score” for potential matches; we were also able to leverage this score as a way to judge the matches that we made during the reconciliation process, and to reject tenuous matches.

One benefit of using only one person to perform the matches was that within the project it was not necessary to cite the specific project member who made the match. Additionally, as none of the constituent records in the CMS had Wikidata QIDs prior to this project, there was no need to clarify what project or effort had generated these matches. However, a key piece of our collections discovery infrastructure going forward will be the creation and operationalization of provenance for enriched data, whether AI- or human-generated. Should the results of this project’s reconciliation be incorporated into this infrastructure—whether round-tripped back into the CMS as properties of the constituent records or maintained outside of the CMS as a separate, connected data source—the provenance for these Wikidata QID matches would be captured as originating from the efforts of this project via the OpenRefine reconciliation workflow.

Within the source dataset of constituents from the CMS, there were far more individuals than corporate bodies (4,902 compared to 1,025). We started by reconciling the corporate bodies, so that we could test and refine the process with a slightly smaller dataset with a higher level of uniqueness of constituent names. Of corporate body constituents, 26% were reconciled to Wikidata records, whereas only 5% of individual constituents were reconciled. Many of the corporate body constituents were manufacturers who produced the original IAD object (e.g., a factory that produced pewter mugs and a glassware manufacturer). Looking at reconciliation results by constituent role, 12% of artists, 10% of IAD object makers, and 7% of object owners were reconciled to existing Wikidata items. It is unsurprising that constituents with the role of IAD object owner were the hardest to reconcile, as these constituents often had the most incomplete records—sometimes only a first initial and last name or part of a street address.

2.3 Source Dataset Enrichment

One of the most powerful features of OpenRefine’s built-in Wikidata reconciliation service is that it supports data extension, which makes it possible to pull discrete pieces of information about the reconciled entities (i.e., properties captured on each item) from the Wikidata graph into the source dataset. To support our comparative analysis of constituent data in the National Gallery’s CMS with Wikidata, we leveraged the functionality to create paired columns in our dataset for the properties that we had scoped for comparison (Table 1)—ending up with a column containing the value cataloged in the CMS paired with a column containing the QID or string value of the Wikidata property for the reconciled item for each mapped field.5

The next step in preparing our dataset was to create a “check” column for each pair of CMS/Wikidata columns, which provided a yes/no result based on whether the two values matched. Depending on the fields being compared, some massaging of the data was required prior to running the check, as the way the information was presented in Wikidata did not always align with how it was cataloged in the CMS. For example, the National Gallery’s CMS Constituent module has a field for Nationality, which is populated with values like “American.” The mapped Wikidata property is Country of citizenship (P27), which contains values like “USA.” For our analysis, these two data points would be considered a match, but they do not actually match semantically. To create a more direct value for comparison for these fields that had discrepancies in how information was captured, we created columns with standardized wording to match the Wikidata syntax for creating these check columns. Despite the additional work, these checks were extremely valuable in allowing us to quickly see places where our data and the data from Wikidata did not align. They would also be leveraged in data visualizations, which will be discussed in a later section.

In addition to pulling in the Wikidata property values for the mapped fields in scope for the comparative analysis, the reconciliation service’s data extension provided an opportunity to further enrich the National Gallery’s constituent data with additional unique identifiers of the entities from multiple controlled vocabularies and thesauri. Prior to completing the reconciliation process, none of the constituents related to the Index of American Design had Wikidata IDs stored in the National Gallery’s CMS. However, a small number did have Union List of Artist Names (ULAN) IDs. Once reconciliation was complete, we used the reconciliation service’s data extension to pull in additional ULAN IDs, where they existed, from Wikidata entities that had been matched with our constituents.

In addition to populating additional ULAN IDs, the data extension feature made it possible to pull in Virtual International Authority File (VIAF) and Library of Congress Name Authority File (LCNAF) identifiers for reconciled constituents that had those identifiers captured in their Wikidata records, neither of which had previously been in the CMS. These additional identifiers can now be bulk imported into the National Gallery’s CMS on those constituent records where they have been populated. Incorporating these identifiers into the source system makes the National Gallery constituents more interconnected with Linked Data sources. Once added to published public data sources like the National Gallery of Art Open Data Program GitHub repository, or incorporated into the National Gallery of Art’s online collections, this will increase the potential that other digital humanities researchers can navigate seamlessly between National Gallery collections and the content of other cultural heritage organizations. Additionally, correcting the errors of incorrect ULAN IDs found in the CMS meant that the National Gallery’s information for IAD constituents was more accurate and useful.

2.4 API for Wikidata Views

After completing the constituent reconciliation with Wikidata and enriching our dataset with comparative values for priority fields and new Linked Data identifiers, the project team turned its attention to the last question we hoped to answer and therefore needed to prepare data for: What are the engagement levels with pages about the Index of American Design on Wikidata? For this component of the project, the project team decided to limit the scope of our exploration to engagement with the artwork items in Wikidata, rather than with IAD constituents. This scope could be expanded as an area of future work, owing to the reconciliation work completed during this project, and as/if efforts continue to add in new IAD constituents not currently represented in Wikidata. Items for every IAD artwork in the National Gallery of Art’s collection – over 18,000 – had already been created as part of a previous project (Zweig, 2022), giving a larger set to look at than the 528 reconciled constituents. For the analysis of engagement with these Wikidata items, the project team chose to look at page views, as well as the number of times these items had been edited, and by whom. To pull that information into an additional dataset for analysis, we used the Wikimedia API Portal.6

Utilizing the documentation in the Wikimedia API Portal, the project team developed two separate Python scripts (see supplementary files for these scripts). Both of these scripts read in an input CSV containing the more than 18,000 Wikidata QIDs for IAD artwork items, then used that list of QIDs to query the API service. The first script (wikidataPageViews.txt) queries the Wikimedia endpoint that provides metrics about individual Wiki pages7 and outputs a CSV containing the total number of pageviews within a defined date range for each QID (total_user_views). The second script (wikidataPageEdits.txt) queries the Wikimedia Core REST API8 and outputs a CSV containing information about the edit history for each QID (page_creator, date_create, latest_editor, last_edited_date, total_edits).

With these two new data sources created, the project team now had all of the datasets needed to conduct the constituent comparative analysis and build visualizations within a dashboard to aid National Gallery of Art staff in better understanding the IAD collection’s reach in Wikidata and identify gaps and discrepancies in cataloging between the organizational CMS and Wikidata.

3. Comparative Analysis and Visualization

For building the Index of American Design Explore by the Numbers dashboard, the project team utilized Power BI, a powerful visualization tool and a product that the National Gallery uses for a variety of internal reporting purposes. Although Power BI is a proprietary platform, the processes described here—data modeling, field-level comparison, and interactive visualization—are fully transferable to open-source environments. Comparable dashboards could be developed using open-software frameworks such as R (leveraging, for example, Shiny9) or Python (using Plotly Dash10 or Streamlit11). These environments support the same essential functions required for this project: connecting to local and remote data sources through APIs; ingesting structured data in common formats such as CSV, JSON, and XLSX; transforming and normalizing values for comparison; and visualizing results through configurable charts, tables, and filters. While Power BI offered institutional convenience and integration with existing data infrastructure, the workflow described in this paper is intentionally platform-agnostic and could be adapted readily for other systems that meet these technical and interoperability criteria.

A key step before pulling the data into Power BI and starting the dashboard build was to identify the stories we wanted to tell and the common questions that users have about IAD representation in and engagement through Wikidata, and how the Wikidata records may diverge from cataloged data in the CMS. Tailoring the contents of the dashboard to these requirements would ensure that the tool was relevant and useful to staff, and give the project team a clearer picture on how to guide users through the data. The project team and stakeholders were driven by the basic questions asked at the beginning of the project – How many of these constituents are represented in Wikidata? In cases where constituents have been reconciled to Wikidata, how does the information in the CMS compare to the information on Wikidata? Where are there discrepancies or gaps between the two data sources? With the scope solidified around these and other core questions, the prepared datasets were imported into Power BI desktop, where additional data transformation steps and calculations would be set up as needed to power the visuals that would provide insights and answers to these questions.

Though much of the data preparation for analysis and visualization was completed in OpenRefine and through the Python scripts (as detailed above), Power BI includes tools to further refine data models. On ingest, datasets loaded into the application can be modified using the Power Query M Language.12 Additionally, the DAX (Data Analysis Expressions) formula language can be leveraged to create individual measures for populating visualizations.13 The project team utilized both of these query languages in building out the Power BI dashboard–Power Query for isolating a table containing only those constituent records that were reconciled to Wikidata, and DAX for creating a variety of measures. These measures were mainly filtered counts of how many pages or constituents fit a certain set of parameters (e.g., a count of reconciled constituents classified as individuals; a count of constituents with life date information in both Wikidata and the CMS).

Leveraging the check columns created in OpenRefine, the dashboard was built out so that users could quickly and clearly see instances where the data in the CMS and the data on Wikidata did not align. This is demonstrated by the central bar chart in Figures 6 and 7.

johd-12-426-g6.png
Figure 6

A screenshot from the interactive Power BI dashboard that depicts the nationality of IAD-related constituents across Wikidata and the National Gallery of Art’s CMS.

johd-12-426-g7.png
Figure 7

A screenshot from the interactive Power BI dashboard that depicts the birth years of IAD-related constituents across Wikidata and the National Gallery of Art’s CMS.

These comparisons present potential areas for future work–as the CMS is the system of record for data about the National Gallery’s collection, curators and registrars are regularly making updates to constituent records as new data and knowledge about these individuals and organizations becomes available. By identifying cases where there is information in Wikidata that is not present in the CMS, this dashboard provides staff researchers with a guide pointing them directly to Wikidata as a potential source for new data to be cataloged. In addition to identifying data to be added to the CMS, another area for future work that this dashboard opens up is where the National Gallery could make additional contributions to Wikidata. For example, the donut chart in the upper right corner of the Index Representation in Wikidata page of the dashboard (Figure 5) is interactive, functioning as a filter so that the constituents table on the page can display just those constituents in Wikidata that do not have the National Gallery of Art artist ID property (P2252). This is a helpful piece of information to have on the Wikidata record as it helps direct users back to the National Gallery’s website and search for information we have about that person/organization based on their Constituent ID. There are currently 110 Wikidata items for reconciled constituents that have this property, which was a pleasant surprise to the project team. As previously mentioned, the prior Wikimedia project undertaken at the National Gallery did not include constituents in scope. This suggests that the addition of these National Gallery of Art artist IDs to Wikidata items was completed by Wikidata editors not affiliated with the National Gallery. Coupled with the Engagement with Object Wikidata Pages (Figure 8) page of the dashboard, these community edits support the case to be made that Wikidata provides an important avenue for engagement with the National Gallery’s collections.

johd-12-426-g8.png
Figure 8

A screenshot from the interactive Power BI dashboard that depicts view counts and edit counts for Wikidata pages about Index of American Design renderings, including the page creator and the latest editor.

4. Implications and Ongoing Applications

Wikidata continues to grow in importance as a common ground for humanities research and for galleries, libraries, archives, and museums (GLAMs), as evidenced by recent surveys (Zhao, 2022; Candela et al., 2024) and by the contributions to this special issue. The reconciliation and visualization efforts described in this paper were motivated by a desire to know how IAD data is represented on Wikidata and how it is viewed, used, and enhanced by the community. Although focused here on the IAD specifically, the crosswalks established with this project open the door toward analyzing the wider impact of cultural heritage work with Wikidata at our institution. As stated by Sonoe Nakasone, sharing GLAM data to open platforms like Wikidata builds a web of knowledge, of “interconnected data points that link people to more and more related information” (Nakasone, 2022). By reconciling IAD constituents to Wikidata, we further connected our data to ULAN and VIAF identifiers, opening future pathways for digital humanities researchers to access and discover the Index of American Design artwork collections and archival materials as wider sets of data from National Gallery collections are opened to researchers.

4.1 The Value of Dashboards

Dashboard tools enable knowledge workers and managers to envision the labor required to monitor discrepancies and perform periodic data reconciliation efforts as a given museum’s local CMS data and Wikidata values change. While Power BI was our preferred data visualization tool for this project, positioned within the museum and looking outward, other tools might achieve different results. For example, the inteGraality tool14 for querying and creating dashboards in Wikidata could invite a wider group of users to enhance Wikidata items with additional statements and media. Programs are also needed to equip researchers and cultural heritage professionals with skills and knowledge to collaborate efficiently. Experts in the Wikimedia community engaged with GLAM data have developed training materials and workbooks, like the Wikipedia Workbook for Cultural Institutions (Ockerbloom, 2024). In support of these efforts, the National Gallery hosted a GLAM Camp in 2025, organized by Wikimedia DC.15

Another question we were interested in looking at, but which was outside of the scope of this project, was seeing how these IAD artworks are being used in other pages across Wikidata and Wikipedia. Many IAD drawings are inserted on Wikipedia pages, to illustrate obscure or obsolete crafts or furniture items, for which photographs may not exist. Quantifying such uses of IAD drawings would be another way to gauge how IAD works are used in Wikidata by the community. Additionally, we could carry out a similar reconciliation for the pages of reconciled constituents and see how those pages are viewed and edited; however, that is slightly less useful, as none of those pages were created by National Gallery knowledge workers, and because many of the constituents are notable for reasons beyond their participation in the IAD.

4.2 Roundtripping, Long and Short Term

As mentioned previously (2.2), the CMS and Wikidata datasets have remained independent following the National Gallery’s preliminary Wikimedia projects. This separation has been valuable in enabling a clean comparative analysis, but it also underscores the next frontier for GLAM–Wikidata collaboration: the potential for “data roundtripping,” or the cyclical exchange of information between institutional systems and open knowledge platforms (Fauconnier, 2019). Establishing such a feedback loop would allow enhancements, corrections, and new relationships contributed through Wikidata to be evaluated and selectively re-integrated into a museum’s CMS or online collection interfaces. Institutions vary in their comfort levels with this process, with many simply displaying outbound identifiers and URIs, as the National Gallery currently does on artwork collection webpages under a “research resources” section. Others have begun embedding textual statements or incorporating values directly from Wikidata, Wikipedia, or other external knowledge bases. Each approach raises practical and ethical questions about data authority, curatorial oversight, and institutional trust in collaboratively maintained knowledge graphs. Future work in this direction may focus on exhibitions of IAD watercolors at the National Gallery and other museums, like the Museum of Modern Art, to interrogate the extent to which exhibitions contributed to IAD artists’ careers after the FAP, or the demographics of IAD artists compared to other artwork collections. Aggregating data on constituents who donated or gifted objects to museums that were rendered for the IAD may also reveal trends in museum development in the wake of the FAP.

In the long term, effective roundtripping will depend on clear governance models for when and how Wikidata content is incorporated, along with stronger confidence in the accuracy of individual statements. Wikidata features such as preferred statements and qualifiers have the potential to increase confidence in community-created content. Next steps towards roundtripping could involve designing processes to contribute National Gallery CMS data about IAD constituents to Wikidata supported by references and qualifiers, with preferred ranking of statements where other unreferenced or unqualified statements already exist for a given Wikidata item, rather than overwriting or deleting existing statements. Museum data about artists and artwork collections often needs to account for uncertainties and debated assertions in historical context, drawing on “weaker logical status” statements that can be expressed in Wikidata qualifiers (Di Pasquale et al., 2024). Conveying the nature of uncertainty of information sourced from our own archival records, including conflicting statements derived from the original handwritten and typewritten Data Report Sheets from the FAP held in Gallery Archives, may enable researchers beyond our museum to contribute their knowledge in a similarly qualified and ranked system. Knowledge workers at GLAM institutions charged with maintaining the authority of records in their institution’s CMS can then demonstrate to subject matter experts (like curators and historians) the affordances of preferred and deprecated statements in the Wikidata knowledge graph that we are seeking to roundtrip. These qualifications, alongside the quantified Power BI dashboards, offer actionable next steps.

In the short term, projects like the one detailed in this paper point toward hybrid approaches that prioritize human oversight and curation while leveraging machine automation. Reconciliation and visualization tools can be created to identify discrepancies and opportunities for enrichment, but curatorial review remains essential to ensure data quality and interpretive integrity. Emerging infrastructures such as Wikibase may help bridge these environments, supporting traditional data management within GLAM systems while maintaining interoperability with Wikidata.16 Taken together, these developments suggest a future in which Linked Open Data circulates through rather than out of institutions, enhancing both internal data stewardship and the shared public record of cultural heritage.

Additional Files

The additional files for this article can be found as follows:

Supplementary File 1

PDF of selected Power BI viz(es): IAD Wikidata Exploration. DOI: https://doi.org/10.5334/johd.426.s1

Supplementary File 2

Python codes: wikidataPageViews.txt and wikidataPageEdits.txt. DOI: https://doi.org/10.5334/johd.426.s2

Notes

[1] National Gallery of Art, Open Access Policy. Retrieved on December 1, 2025, from https://www.nga.gov/terms-and-notices#open-access.

[2] GitHub, National Gallery of Art Open Data. Retrieved on December 1, 2025, from https://github.com/NationalGalleryOfArt/opendata.

[3] Wikidata Query Service. Retrieved on December 1, 2025, from https://query.wikidata.org/.

[4] Wikidata reconciliation for OpenRefine. Retrieved on December 1, 2025, from https://wikidata.reconci.link/.

[5] OpenRefine, Reconciling. Retrieved on December 1, 2025, from https://openrefine.org/docs/manual/reconciling.

[6] Wikimedia API Portal. Retrieved on December 1, 2025, from https://api.wikimedia.org/.

[7] Wikimedia Analytics API, Page Metrics. Retrieved on December 1, 2025, from https://doc.wikimedia.org/generated-data-platform/aqs/analytics-api/examples/page-metrics.html.

[8] Wikimedia API Portal, Core REST API. Retrieved on December 1, 2025, from https://api.wikimedia.org/wiki/Core_REST_API.

[9] Shiny. Retrieved on December 1, 2025, from https://shiny.posit.co/.

[10] Plotly Dash. Retrieved on December 1, 2025, from https://dash.plotly.com/.

[11] Streamlit. Retrieved on December 1, 2025, from https://streamlit.io/.

[12] Microsoft Learn, Power Query M. Retrieved on December 1, 2025, from https://learn.microsoft.com/en-us/powerquery-m/.

[13] Microsoft Learn, Data Analysis Expression (DAX) Reference. Retrieved on December 1, 2025, from https://learn.microsoft.com/en-us/dax/.

[14] Wikidata Tools, inteGraality. Retrieved on December 1, 2025, from https://www.wikidata.org/wiki/Wikidata:Tools/inteGraality.

[15] Wikipedia, Wikimedia DC GLAM Camp 2025. Retrieved on December 1, 2025, from https://en.wikipedia.org/wiki/Wikipedia:Meetup/DC/Wikimedia_DC_GLAM_Camp_2025.

[16] Wikibase. Retrieved on December 1, 2025, from https://wikiba.se/.

Acknowledgements

Thanks to Keith Krut, Steven Nelson, and Rob Stein. We are also grateful to many colleagues at the National Gallery of Art who care for collections and archives, act as data stewards, support innovation, and contribute to research on the Index of American Design: Michele Willens, Shannon Morelli, Margaret Huang, Elizabeth Concha, Rebecca Mei, Amy Johnston, Julia Demarest, Adam Purvis, and Peter Lukehart.

Competing Interests

The authors have no competing interests to declare.

Author Contributions

Abigail Foster: Data curation; Investigation; Software; Visualization; Writing – original draft; Writing – review & editing.

Samantha Norling: Data curation; Investigation; Software; Visualization; Writing – original draft; Writing – review & editing.

Matthew J. Westerby: Conceptualization; Writing – original draft; Writing – review & editing.

DOI: https://doi.org/10.5334/johd.426 | Journal eISSN: 2059-481X
Language: English
Submitted on: Oct 23, 2025
|
Accepted on: Nov 22, 2025
|
Published on: Jan 2, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Abigail Foster, Samantha Norling, Matthew J. Westerby, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 12 (2026): Issue 1