Refining Wikidata’s Bibliographic Domain to Increase Reuse in GLAMs

Kalliopi Mathios; Ege Atacan Doğan; Jeremy Nelson

doi:10.5334/johd.454

Full Article

(1) Context and motivation

(1.1) Introduction

Librarians and library workers contribute to Wikidata through developing data models, creating and editing existing data, adding references, or using automation and open source tools to enhance data. The LD4 Community’s Wikidata Affinity Group has held regular meetings since 2019, provides ongoing Wikidata training to the Galleries, Libraries, Archives, and Museums (GLAM) community, and is open to all, free of charge (LD4 Community site, n.d.). Wiki Education, a non-profit that supports researchers and librarians across Wikimedia projects, offers Wikidata courses and webinars. Librarian-designed workshops for catalogers provide instructions and guidance for adding faculty profiles to Wikidata (Tillman, 2019). This work contributes to the rich network of research data discoverable through Wikidata and used by tools like Scholia (Nielsen et al., 2017). Librarians’ work in Wikidata parallels the breadth of specialities within the field (Kent, 2019). When viewed holistically, these efforts shine a light on librarians as a distinct Wikidata user community, made up of smaller, more local efforts. Yet the focus on traditional bibliographic data and modeling seems less prevalent than working with people, names, institutions, and special collections. The complexities of navigating the Wikidata ontology and conceptual frameworks for bibliographic description can be challenging, and work done to model domains of knowledge within Wikidata can be less visible than projects like creating scholarly profiles for faculty, which can be queried and visualized to illustrate impact and increased discoverability instantly. While it may be less visible, improving the Wikidata ontology is critical to increasing Wikidata reuse in external, emerging applications that are aligned with professional metadata standards. In an effort to better understand Wikidata’s bibliographic domain, a project created as part of the Wikidata Ontology Course studied how the Wikidata ontology relates to the Bibliographic Framework (BIBFRAME) ontology, developed and maintained by the Library of Congress.

In the course, we discussed potential ways to improve the use of book (Q571) and its place within the Wikidata ontology. We suggested making book (Q571) a subclass of intellectual work (Q15621286), but at the time decided against making changes that may disrupt the work of the WikiProject:Books community (WikiProject Books, 2025). A related entity, document (Q49848), was deprecated as a subclass of intellectual work (Q15621286), leaving it as a subclass of information resource (Q37866906) and manifestation (Q286583), having no other characteristic of version, edition or translation (Q3331189) other than being the manifestation of an artificial object (Q16686448). We related the audiobook (Q106833) class to the ebook (Q128093) class through a union of (P2737) property statement on the book (Q571) item. Further investigations led us to develop strategies for augmenting the Wikidata ontology to improve alignment with the BIBFRAME 2.0 ontology (Library of Congress, 2016) and other vocabularies used in library resource description. The method involves extracting Expression and Manifestation classes from Wikidata’s version, edition or translation (Q3331189) class. The potential use of Wikidata to generate a BIBFRAME Work becomes possible with the union of the Work and newly created Expression. These entities are defined according to the IFLA Library Reference Model (LRM) (Riva et al., 2024). Using an IFLA LRM based schema as a source model for mapping to the BIBFRAME ontology is supported by Hahn and Dousa (2020), who apply a set-theoretical framework to mapping the BIBFRAME Work to the IFLA LRM Work and Expression for operational purposes. This discussion paper offers a new approach to modeling the Wikidata ontology’s bibliographic domain in an effort to increase interoperability with international, standards-based frameworks.

(1.2) Previous Explorations of BIBFRAME and Wikidata

Discussion on how to use BIBFRAME with Wikidata dates to at least 2018, with Matt Miller of the Library of Congress’s work to investigate how these two open data pools might benefit from each other (Miller, 2018). In the 2018 blog post titled “Mapping Wikidata to BIBFRAME”, Miller explores how bibliographic data is reflected in Wikidata while comparing it to BIBFRAME 2.0, the latest version of BIBFRAME at the time. BIBFRAME, at the time of this writing, is released in version 2.6, and will soon be released in version 3.0 (Lorimer and Williamschen, 2025). Similarly to our investigations, Miller examined Work entity metadata for monographs. Miller also investigated the use of Wikidata’s book (Q571) as we did in the Wikidata Ontology Course. We suggest that book (Q571) is routinely used not only because it aligns with common, natural language for discussing bibliographic material in English, but also because it does not require the same specialized knowledge of conceptual models like the Functional Requirements for Bibliographic Records (FRBR) or LRM models. At the time of this writing, nearly 20,000 instances of book (Q571) exist in Wikidata nonetheless. WikiProject:Books, however, strongly discourages the use of book (Q571).

Miller approaches Wikidata’s bibliographic domain as a whole; instead of starting with WikiProject:Books’s guidelines or with the BIBFRAME ontology, Miller highlights the 311 properties used in practice and clusters them by frequency (Miller, 2018). His analysis reveals that only 11% of entities that are instances of book (Q571) or one of its subclasses had title (P1476) properties, exposing an example of data inconsistencies that discourage reuse, as titles are a key component even when describing a resource minimally. Title property inconsistency makes sense: it is easy to imagine Wikidata editors mistaking the Wikidata item’s label as an equivalent to the title property. Documentation, modeling, constraints and other strategies become critical in supporting editors and improving consistency across Wikidata; however, part of Wikidata’s appeal is its open and collaborative nature. Those who edit Wikidata are volunteers and community members. In this environment, it is not always easy or wise to encourage conformity to a strict set of modeling standards, and not all modeling decisions equally suit Wikidata’s diverse, international constellation of communities.

Before mapping BIBFRAME Work properties to equivalent properties in Wikidata, Miller describes the inconsistencies in Wikidata’s modeling. This work helped inform Wikidata to BIBFRAME Work property mapping (Brief Description for a BIBFRAME Work, 2025), created to aid in the automatic generation of BIBFRAME Works from Wikidata as part of experiments with the Blue Core project (Schreur et al., 2023) and Stanford University’s development of the Blue Core Graph Toolbox¹ prototype. What Miller’s post illustrates is two-fold: firstly, thought experiments around the use of Wikidata in more traditional, authoritative bibliographic spheres are not new to either community; secondly, it illustrates that many of the issues in reusing Wikidata have existed for over seven years, or the entirety of BIBFRAME 2.0’s version history.

(2) Discussion

(2.1) Wikidata’s bibliographic domain

Within Wikidata, different domains of knowledge can be identified within the graph. Editors from across a wide range of expertise contribute to Wikidata, including those with professional training, editors that lack technical training but possess a personal interest in a particular field of knowledge, and bots that execute a series of actions based on rules programmed by a human user that may or may not have expertise in a particular domain of knowledge or Wikidata and its community-developed practices. The convergence of these editor types means that particular areas of knowledge may be more consistent, and therefore more reliable than others, and may follow particular patterns. These patterns may not extend to other domains. For example, scholarly articles recently needed to be separated from the whole of Wikidata in what is casually referred to as the “Graph Split” (Wikidata:SPARQL Query Service/WDQS Graph Split, 2024). The routine use of the predicate instance of (P31) with the object value scholarly article (Q13442814) made the split possible, and will ensure that future articles are associated with the scholarly article Wikibase (Wikidata:SPARQL Query Service/WDQS Graph Split/Rules, 2024).

The bibliographic domain in Wikidata is monitored and influenced by the community project WikiProject:Books. The group’s thorough documentation informs new and experienced users alike in navigating the description of bibliographic Works and Editions using an interpretation of the FRBR model first released by IFLA in 1998 (IFLA Study Group on the Functional Requirements for Bibliographic Records, 1998), and later incorporated into the IFLA LRM in 2017 (Riva et al., 2024). At the time of this writing, 127 participants in WikiProject:Books answer questions, help users interpret FRBR and their modeling instructions, and mitigate misuse of terms and properties. Their modeling provides predictable patterns and enables different communities to contribute bibliographic data to Wikidata. The documentation reveals careful thought and community engagement around modeling decisions, as well as long-standing, unresolved issues; however, it is clear that occasionally decisions are made without consensus in order to maintain operational continuity. The WikiProject:Books discussion board is an excellent resource for uncovering who contributes to and benefits from the WikiProject:Books modeling. Notably, the models are in use by Wikimedia’s sister project WikiSource and the WikiSource Reader App (Wikisource reader app, 2024).

(2.2) Use Case: Stanford University’s Blue Core Graph Toolbox

Wikidata and BIBFRAME interoperability experiments surface with the emergence of new library cataloging tools for creating Resource Description Framework (RDF) descriptions. The Blue Core Graph Toolbox is an open-source single page static web application developed for the Blue Core project using Pyscript, an open-source project, that allows Python packages and modules to run within the web browser using Webassembly. RDF resources are loaded into a local RDF graph running in the user’s web browser and the user can then query the loaded graph with SPARQL, as well as update the graph using SPARQL Update queries. The Graph Toolbox also provides import services for Concise Bounded Description RDF packages generated at the Library of Congress, URLs from Blue Core and Sinopia editor datastores, as well as Machine-readable Cataloging (MARC) records. The Library of Congress marc2bibframe2 XSLT files convert MARC XML to BIBFRAME RDF XML that are then loaded into the local graph. The Graph Toolbox can save the local RDF graph back to the Blue Core datastore using the Blue Core API. Users can download the local graph in a variety of RDF serializations or convert BIBFRAME Instance and Work data into a MARC record using the Library of Congress bibframe2marc XSLT. Finally, the Graph Toolbox provides a validation service for included BIBFRAME Works and Instances using Shapes Constraint Language (SHACL) graphs from the BIBFRAME Interoperability Group (BIBFRAME Interoperability Group, 2025).

The Blue Core Graph Toolbox allows users to interact with multiple AI Agents through the Blue Core API. When a user submits a search query, an AI Agent sends the query to multiple sources including the Blue Core datastore, the Sinopia API, the Library of Congress linked data service, and Wikidata. With the returned results, the user can selectively load RDF entities into the local RDF graph. For entities from Wikidata, the AI Agent attempts to transform the Wikidata RDF triples to derive BIBFRAME Work triples. Another AI Agent assists the user in generating SPARQL statements to run on the locally loaded graph. Using a third AI Agent, The Graph Toolbox allows users to check for duplicate BIBFRAME Works and Instances in Blue Core datastore.

(2.3) Differences in bibliographic conceptual frameworks

Studies on the alignment of conceptual frameworks for library resources by Zapounidou et al. (2017), Taniguchi (2018), Hahn and Possemato (2023), and others provide thorough explanations of the foundational components of FRBR, LRM, Resource Description and Access (RDA), and BIBFRAME. This research also provides background information on mapping between source and target models, while aiming to retain semantics throughout the transformation process (Zapounidou et al., 2017; Hahn & Possemato, 2023). Tables 1, 2, 3 aim to provide a brief overview of the IFLA LRM (Table 1), WikiProject:Books bibliographic model (Table 2), and BIBFRAME (Table 3), along with example entities. Because Wikidata’s bibliographic modeling is based on an interpretation of FRBR, LRM is the most appropriate high-level model for refining the Wikidata ontology today, as it “adopts the approach of the original FRBR study” (Riva et al., 2024).

Table 1

IFLA LRM (Riva et al., 2024).

ENTITY	DEFINITION	EXAMPLE
Work	The intellectual or artistic content of a distinct creation	Homer’s Odyssey
Expression	A distinct combination of signs conveying intellectual or artistic content	The English translation by Robert Fagles of Homer’s Odyssey, copyright 1996
Manifestation	A set of all carriers that are assumed to share the same characteristics as to intellectual or artistic content and aspects of physical form. That set is defined by both the overall content and the production plan for its carrier or carriers	The Odyssey of Homer/translated with an introduction by Richmond Lattimore, first Harper Colophon edition published in the Perennial library series, in New York by Harper & Row in 1967, ISBN 0–06-090479-8 [manifestation containing the complete text of Richmond Lattimore’s English translation of the Greek poem]
Item	An object or objects carrying signs intended to convey intellectual or artistic content	Library of Congress Copy 2 of Homer. The Odyssey/translated by Robert Fagles, Penguin Classics, Deluxe edition published in New York by Penguin Books in 1997, ISBN 0–670-82162-4

Table 2

WikiProject:Books (Mandal, 2023).

ENTITY	DEFINITION	EXAMPLE
Written work or one of its subclasses	Any work expressed in writing, such as inscriptions, manuscripts, documents or maps	Natural History (Q442)
Version, edition or translation	Specific version of a work, resulting from its edition, adaptation, or translation; set of substantially similar copies of a work	Pliny’s Natural History (Q123853189)
Individual copy of a book	Specific physical copy of a book	Pliny’s Natural History (Q123938582)

Table 3

BIBFRAME (Library of Congress, 2016).

ENTITY	DEFINITION	EXAMPLE
Hub*	An abstract resource that functions as a bridge between two or more Works	Great expectations (https://id.loc.gov/resources/hubs/9c9f5603-596b-9708-276c-0ab4d5c1bb45. Last accessed date: 1 November 2025)
Work	Resource reflecting a conceptual essence of a cataloging resource	Great expectations (https://id.loc.gov/resources/works/9398625. Last accessed date: 1 November 2025)
Instance	Resource reflecting an individual, material embodiment of a Work	Great expectations (https://id.loc.gov/resources/instances/9398625. Last accessed date: 1 November 2025)
Item	Single example of an Instance	Great expectations (https://lccn.loc.gov/65029850. Last accessed date: 1 November 2025)

[i] *The BIBFRAME Hub is described as an aggregating or collating class that provides a bridge between two or more works (Ford, 2025), rather than an equivalent class to the LRM Work.

Table 4 illustrates differences across conceptual models. Both WikiProject:Books and the BIBFRAME ontology streamline FRBR, but in critically different ways: the Wikidata version, edition or translation (Q3331189) is loosely based on the FRBR Expression and Manifestation (Mandal, 2023). The BIBFRAME Work is loosely based on the FRBR Work and Expression (Zapounidou et al., 2017). Analyzing the Wikidata ontology alongside more traditional bibliographic models shows the difficulty in reusing Wikidata within the bibliographic domain, and reflects interoperability challenges within the larger library community. Experiments within the Blue Core project to generate a minimal set of statements for a BIBFRAME Work were promising; however, data associated with a BIBFRAME Work will be found in the Wikidata version, edition or translation (Q3331189) entities. The Wikidata version, edition or translation (Q3331189) entities will also contain data for a BIBFRAME Instance.

Table 4

A comparison between IFLA LRM, WikiProject:Books, and BIBFRAME.

IFLA LRM	Wikiproject:BOOKS	BIBFRAME
Work	Written work or one of its subclasses	Work
Expression	Version, edition or translation or one of its subclasses	Work
Manifestation	Version, edition or translation or one of its subclasses	Instance
Item	Individual copy of a book	Item

(3) Implications

(3.1) Four distinct, disjoint classes

In ontology design, certain entities are mutually dependent. An example from Wikidata is the distinction between Stockholm (Q1754, city) and Stockholm (Q506250, administrative territory). Although these two entities describe inseparable aspects of the same geographical reality, they serve different conceptual roles. The city denotes an inhabited settlement; the administrative territory denotes a jurisdiction. This approach is not used for all cities; some city items are combined entities of the settlement and the jurisdiction. This creates an inconsistency in modelling. The split is therefore not universally applied. However, when it is applied, it models separately two mutually dependent entities which are normally conflated in natural language and are referred to with the same label.

This same tension appears in bibliographic modeling. In frameworks such as LRM, the entities Work, Expression, Manifestation, and Item (WEMI) form a strictly disjoint hierarchy while being mutually dependent (Riva et al., 2024). Such structural dependency raises a practical question in both ontology and database design: in singleton sets, in which a Work only has one Expression, or an Expression only has one Manifestation, do we have to separate these into distinct entities? We think that even in this case, it makes the most sense to split the entities.

In many database systems, and even in natural language, mutually dependent notions are often merged. Doing so can simplify storage or expression, but risks collapsing meaningful conceptual boundaries. We advocate maintaining a three-level distinction among Work, Expression, and Manifestation, while only instantiating an Item when it possesses contextual significance.

On Wikidata, part of this structure exists: written work (Q47461344) is distinct from version, edition or translation (Q3331189). Yet the latter category combines what bibliographic theory separates, that is Expression and Manifestation. Meanwhile, the BIBFRAME model’s BIBFRAME Work elegantly streamlines two more abstract levels into a single entity. Since BIBFRAME and Wikidata have implemented a three-entity approach in different ways, we propose dividing the version, edition or translation (Q3331189) class into two, even when this seems redundant, to align with a common conceptual model. Conceptual precision should take precedence over parsimony.

A thought experiment illustrates the issue. Suppose Mary’s Diary is a unique handwritten volume. It simultaneously qualifies as a Work, Expression, Manifestation, and Item. Should we represent it as four entities or one? While collapsing the entities into a single node might seem convenient, doing so erases distinctions vital for reasoning and reuse. The diary’s Work represents its intellectual content, its Expression represents the specific linguistic form, and its Manifestation carries the artistic and conceptual characteristics as well as those of the published embodiment. The Item represents the object and, though present, adds no essential information to Wikidata unless its individual copy holds curatorial or rare value. For common circulating material that exists outside of a special collections or rare materials category, this would mean that three different entities are sufficient within a Wikidata context; however, within a traditional library system, the Item holds particular value for acquisitions and circulations workflows.

Since a BIBFRAME Work carries features from both the LRM Work and Expression, this combined entity can be classified neither as a Work, nor as an Expression. This approach, which functions practically as a “Work-Expression” class, is too confusing when applied to Wikidata, and not worth the benefit of having fewer overall items. In a combined class, properties belonging to different conceptual levels, such as the ISBN (Manifestation-level) and the illustrator (Expression-level), are conflated. LRM avoids this issue by enforcing disjointness between an Expression and a Manifestation. Consequently, in our model, BIBFRAME Works will have to be classified into either Work or Expression to be represented in Wikidata. This does not preclude external tooling from making use of the distinct Work and Expression entities in service to generating a BIBFRAME Work downstream of Wikidata.

(3.2) Redundancy in modeling

In modeling terms, redundancy refers to the repetition of information. In practice, while some redundancy is accepted when it serves a useful purpose, it is typically kept to a minimum. We use the term value-neutrally, recognizing both its benefits and drawbacks. A key benefit is that information repeated across independent sources increases reliability. Similarly, conflicting information becomes a useful signal for detecting errors, which would be more difficult under strict constraints against redundancy. Stating the same information multiple times may also enhance accessibility, since similar facts can be retrieved from different statements. The drawbacks are straightforward: seemingly repetitive data entry means a larger knowledge graph, requiring more storage and making queries slower. It can also obscure relevant facts by surrounding them with seemingly duplicate data.

At first glance, redundancy appears to be a factor when considering FRBR and LRM; however, in entity-relationship modeling, entities do not inherit attributes as taxonomic class hierarchies do (Renear & Choi, 2007). For example, a Work is created by an author. This Work is realized through one or more Expressions; one or more of these Expressions are embodied in one or more Manifestations, each of which is exemplified by many Items. Each entity may appear to repeat the same author statement, especially when multiple Expressions and Manifestations exist. Given the different meanings and uses of properties for different entities, even when the value appears to be equal, it may not carry the same meaning across entity types (Renear & Choi, 2007). In practical application of the model in Wikidata, a crisp solution is to record such information in the most appropriate entity (e.g. authorship represented in the Work entity) and infer it through entity relationships when appropriate. While this reduces direct accessibility, better tools can address that issue. Philosophically, the linked semantic web already supports indirect access: for example, one can easily retrieve someone’s grandmother’s grandmother, even if this relationship is never explicitly stated in a triple associated with an entity. Likewise, the Work associated with a particular Item and the attributes therein can be found by referencing its relationships within a given set. While multiple implementations seem plausible, we opt to include seemingly redundant statements such as title (P1476) across entity types to provide Wikidata users with additional context to disambiguate between entities within the larger Wikidata pool.

(3.3) Relationships between the four classes

The graph shows the four classes proposed in our model, based on the IFLA LRM categorization. These classes are interconnected, but the links should only connect adjacent classes rather than forming all possible pairwise connections. This results in three connections. A naive approach could be to use instance of (P31) links for all of them, as it makes some semantic sense to use the English word “is” in between instances of each. This approach would create class order problems, and would be based on a non-formal understanding of instance relationships. This compels us to integrate a more technical model into Wikidata.

Currently Wikidata connects its three classes with two properties:

Items, individual copy of a book (Q53731850), are connected to the Expression-Manifestation class, version, edition or translation (Q3331189), through exemplar of (P1574). This is a direct analogue of the IFLA LRM exemplifies.
Expression-Manifestations, version, edition or translation (Q3331189), are connected to written work (Q47461344) through edition or translation of (P629). This property is a conflated property, with the union class of two disjoint classes given as argument.

Entity relationships are established according to the IFLA LRM (Riva et al., 2024) using the three following properties highlighted in the Property column in Table 5 below, analogous to the properties our model proposes on Wikidata.

Table 5

IFLA LRM (Riva et al., 2024).

SUBJECT	OBJECT	PROPERTY	INVERSE
Item	Manifestation	exemplifies	is exemplified by
Manifestation	Expression	embodies	is embodied by
Expression	Work	realizes	is realized through

Wikidata properties do not exist in a vacuum, but are vital parts of the ontology. Ideally, they have solid connections with other properties, good example usages, and documentation.

Exemplar of (P1574) is a great connector between Items and Manifestations. This relationship is practically a subproperty of instance of (P31). This approach is slightly controversial in approaches that define classes as analogous with universals (Smith, 2013) but would make sense in a set-based understanding of class. Our model keeps this property, and defines it between the Item and the Manifestation.

The embodiment relationship between Manifestation and Expression cannot be properly identified using manifestation of (P1557), because it is used in a much more general sense. The property has many aliases, such as realization of, expression of, and embodiment of, further precluding a technical use. The usage examples include banknote being a manifestation of money, and citizen being a manifestation of citizenship. Therefore, fully accomplishing our model would have to find or create a property for use in books. Said property could be a subproperty of manifestation of (P1557), but has to be specific for this domain. It would be directly connected to the IFLA LRM embodies through an external identifier.

The realization relationship between Work and Expression is mostly what edition or translation of (P629) is used for, as the distinction between Works and Expressions is represented on Wikidata, even with Expressions being conflated with Manifestations. However, it is also used in connecting Manifestations, identifiable by ISBNs, directly to Works. This creates an indirect, two-step relation (analogous to grandmother of) which our model excludes. Consequently, edition or translation of (P629) statements cannot be uniformly mapped to a single property and require case-specific refinement during normalization. Both the embodiment relationship and the realization relationship would be subproperties of edition or translation of (P629), per the current usage on Wikidata. This makes any change to the current properties not a deletion, but a clarification.

(3.4) Overview of the model

Our proposed model is not a rejection of the current Wikidata model, but a refinement. Using subclasses of version, edition or translation (Q3331189) or written work (Q47461344) is already allowed. In fact, many Work entities are stated as instances of literary work (Q7725634), a subclass of written work (Q47461344). Our main contribution to the model is the refinement between Manifestation and Expression, with an added subclass relationship between Translations and Expressions.

In practice, all four tiers are commonly identified as a “book” in English, resembling the English Lexeme “book” (L536), therefore precluding it from a distinct meaning. Naturally, users often tag any of the bibliographic entities, usually Works, as instance of Q571 (“book”). While this is a constraint violation, deletion of said statements results in data loss. Instead, it should be refined by replacing “book” with the appropriate subclass. In other words, assigning instance of book is not strictly incorrect but represents an overly general classification. The appropriate response is refinement into a more specific subclass. Further community discussion is needed to weigh alternative methods for addressing book (Q571) other than discouraging its use. Incorporating it into the model as shown in Figure 1 conceptually defines a “book” as the sum of its WEMI parts.

Proposed model for Wikidata’s bibliographic domain. The empty arrows are subclass links, the full arrow is an instance link (more specifically, *exemplifies*). The dashed lines describe the aforementioned specific relationships, which are omitted from the graph for simplicity.

(3.5) Recommended properties for core classes

Tables 6, 7, 8, 9 below recommend properties for each class. These lists consider the WikiProject:Books modeling, while rearranging property use among separate entities. In Table 6, the Work class captures the intellectual or artistic content of a distinct creation (Riva et al., 2024), and a level of abstraction similar enough to the IFLA LRM Work entity without disrupting existing modeling. The version entity (Table 7) aims to align with the LRM Expression and includes all properties of the edition (Q286583) except for those statements that carry data related to its embodiment such as ISBN-13 (P212) and publisher (P123). Further study is needed to examine Wikidata properties that are suitable for the version class, especially when reviewing models for other creative and intellectual outputs like music or film.

Table 6

Minimum set of Work item properties.

PROPERTY LABEL	PROPERTY ID	WIKIBASE DATA TYPE	DESCRIPTION
instance of	P31	item	relation of type constraints; value should be written work or subclasses
title	P1476	monolingual text	published name of a work
subtitle	P1680	monolingual text	for works, when the title is followed by a subtitle
author	P50	item	main creator(s) of a written work
author name string	P2093	string	stores unspecified author or editor name for publications
language of work or name	P407	item	language associated with this creative work
inception	P571	point in time	time when an entity begins to exist
main subject	P921	item	primary topic of a work or act of communication
genre	P136	item	creative work’s genre
has version (property proposal required)	TBD	item	defines a relationship between the Work entity and its Expression (Version)

Table 7

Minimum set of Expression item properties.

PROPERTY LABEL	PROPERTY ID	WIKIBASE DATA TYPE	DESCRIPTION
instance of	P31	item	relation of type constraints; value should be version
title	P1476	monolingual text	published name of a work
subtitle	P1680	monolingual text	for works, when the title is followed by a subtitle
author	P50	item	main creator(s) of a written work
publication date	P577	point in time	date or point in time when a work or product was first published or released; this property is currently conflated and requires further analysis
editor	P98	item	person who checks and corrects a work (such as a book, newspaper, academic journal, etc.) to comply with a rules of certain genre
illustrator	P110	item	person drawing the pictures or taking the photographs in a book or similar work
author of foreword	P2679	item	person who wrote the preface, foreword, or introduction of the book but who isn’t an author of the rest of the book
author of afterward	P2680	item	person who wrote the postface, afterword, or conclusion of the book but who isn’t an author of the rest of the book
translator	P655	item	agent who adapts any kind of written text from one language to another; only applicable if relevant
language of work or name	P407	item	language associated with this creative work
writing system	P282	item	alphabet, character set or other system of writing used by a language, word, or text, supported by a typeface
has edition (property proposal required)	TBD	item	defines a relationship between Expression and Manifestation
has translation (property proposal required)	TBD	item	defines a relationship between the original Expression and its translations

Table 8

Minimum set of Manifestation item properties.

PROPERTY LABEL	PROPERTY ID	WIKIBASE DATA TYPE	DESCRIPTION
instance of	P31	item	relation of type constraints; value should be edition
title	P1476	monolingual text	published name of a work
subtitle	P1680	monolingual text	for works, when the title is followed by a subtitle
author	P50	item	main creator(s) of a written work
contributor	P767	item	person or organization that contributed to a subject: co-creator of a creative work or subject
publisher	P123	item	organization or person responsible for publishing a work, such as a book, periodical, printed music, podcast, game or software
place of publication	P291	item	geographical place of publication of the edition
issue date (property proposal required)	TBD (currently handled under P577)	point in time	date a manifestation is released
edition number	P393	string	number of an edition (first, second, … as 1, 2, …) or event
part of the series	P179	item	series which contains the subject
printed by	P872	item	organization or person who printed the creative work
ISBN-10	P957	external identifier	former identifier for a book (edition), ten digits
ISBN-13	P212	external identifier	identifier for a book (edition), thirteen digits

Table 9

Minimum set of Item item properties.

PROPERTY LABEL	PROPERTY ID	WIKIBASE DATA TYPE	DESCRIPTION
instance of	P31	item	relation of type constraints; value should be individual copy of a book
exemplar of	P1574	item	links an individual copy of a work to the item for that work or edition
location	P276	item	location of the object
owned by	P127	item	owner of the subject
catalog code	P528	string	catalog name of an object
inventory number	P217	string	identifier for a physical object or a set of physical objects in a collection

The Manifestation is represented in the Wikidata class edition (Q286583) and aligns with the IFLA LRM Manifestation and BIBFRAME Instance, as outlined in Table 8. These properties aim to capture the physical or digital embodiment of a resource. These entities include statements capturing publication details for a particular resource and when joined with the version entity reflect the qualities of the version, edition or translation (Q3331189). When joining the statements of an entity belonging to the written work (Q47461344) or its related subclasses and the statements of an entity belonging to the edition class together, we are able to generate a BIBFRAME Work. Further analysis is needed on how to best utilize existing Library of Congress BIBFRAME Hub ID (P11859) and BIBFRAME Work ID (P13714) properties; however, including these identifiers within the Wikidata ecosystem may prove helpful when an definitive match can be identified.

The properties selected for the Item class support object related attributes in Table 9; however, when considering the practical use of BIBFRAME Items in Wikidata, there may not be great value in instantiating these entities unless they are of special note, as stated earlier. Rare materials and special collections items challenge the notion that item-level description need not be more extensive as these resources provide immense value to Wikidata and its contribution to modeling open, global knowledge. The Art and Rare Materials (ARM) BIBFRAME Extension (Art and Rare Materials (ARM) BIBFRAME Ontology Extensions, 2018) showcases possibilities for using BIBFRAME with rare materials that reinterpret traditional MARC-based descriptive practices and extend beyond the proposed Wikidata properties listed in Table 8. Considerations and application profiles for rare materials and special collections deserve further review and exploration.

(4) Conclusion

At WikiMania 2025 in Nairobi, Kenya, Wikimedia Deutschland’s Lydia Pintscher presented a talk titled “Wikidata: We Want Our Data To Be Reused. But Do We Really?” (Pintscher, 2025). In it, she describes the benefits and challenges of reusing Wikidata’s large knowledge graph of millions of statements freely available under a CC0 public domain dedication. She groups challenges to Wikidata reuse into complex technical, social, and data categories (Pintscher, 2025). Her description of data reuse issues centers on the Wikidata ontology and its inconsistent modeling, conceptual ambiguity, and the difficulty inherent in describing the vast universe reflected in Wikidata (Pintscher, 2025). Our work to address Wikidata reuse within GLAMs serves as an effort to help alleviate ambiguity by identifying key ontology issues within the bibliographic domain, align Wikidata’s modeling with international, standards-based conceptual frameworks, and make clear conflations that exist within Wikidata classes and properties. Work to build community consensus across Wikidata means creating a model of Wikidata’s bibliographic domain that works for more users while minimizing community disruption. Further work is needed to establish new properties, identify unintended consequences of this work, and examine other creative and intellectual domains as they relate to the Wikidata ontology as a whole.

Notes

[2] Blue Core Project. (n.d.). Home. Retrieved December 15, 2025, from https://dev.bcld.info.

Acknowledgements

The authors would like to thank Peter F. Patel-Schneider and the participants of the Wikimedia Foundation funded Wikidata Ontology Course, 2025.

Competing Interests

The authors have no competing interests to declare.

Author Contributions

Kalliopi Mathios: Writing – original draft, Writing – Review & Editing

Ege Atacan Doğan: Writing – original draft, Writing – Review & Editing

Jeremy Nelson: Writing – original draft