Have a personal or library account? Click to login
Communicating Biodiversity Data Restriction Rationales: Balancing Specificity with Practical and Ethical Considerations Cover

Communicating Biodiversity Data Restriction Rationales: Balancing Specificity with Practical and Ethical Considerations

Open Access
|Dec 2025

Full Article

Introduction

Data sharing has accelerated biodiversity research, with web-accessible, open-access databases playing a central role (Feng et al. 2022; Sterner et al. 2023). Biodiversity databases vary in scope, covering specific taxa, regions, or data types. Databases curate data to support discovery and reuse. Shared standards facilitate interoperability, data flows between databases, and global-scale harmonized datasets. As an example, the Global Biodiversity Information Facility (GBIF) aggregates more than 3.5 billion species occurrence records, primarily from smaller databases (GBIF 2025). Databases link data that would otherwise remain siloed, and facilitate research and conservation by enabling searches across a wide evidence base.

Biodiversity is one domain where public participation in scientific research has increased the availability of data. There is exponential growth in data as a result of the public contributing species observations to participatory science platforms (Feng et al. 2022). Participatory science data support scientific research, education, conservation, natural resource management, and public recreation (Chandler et al. 2017; McKinley et al. 2017). In this paper, when we use the term participatory science, we are referring to many kinds of public engagement research, including citizen science (Eitzel et al. 2017).

However, in some cases, open data sharing can harm individual organisms, populations or species, habitats, ecosystems, or humans. For example, publishing locations of commercially valuable species can lead to increased exploitation and population decline (Lindenmayer and Scheele 2017). Biodiversity databases often mitigate harm by designating certain species as “sensitive” and limiting public access to those species data (Chapman 2020). Such restrictions honor databases’ ethical commitments by decreasing the likelihood that others use the data to endanger the environment or humans. However, data restrictions are in tension with open science and can inhibit research and conservation by increasing barriers to data access (Contreras-Díaz et al. 2023). On the one hand, from a participatory science perspective, restrictive data sharing practices may run against project participant desires to view data about certain species, but on the other hand, lack of data restrictions may run against participant concerns about data sharing and harm to sensitive species.

It is important that databases provide rationales, or explanations for data restrictions. If databases do not explain data restrictions, data may not be as findable, accessible, or usable. For example, some databases withhold metadata records for sensitive species from the public and disable queries even if records exist for the species. In such cases, documentation is crucial for clarifying that records exist. More generally, documentation helps convey why records are restricted, what treatments (e.g., generalization) have been applied, and whether requesting access is possible. Moreover, if participatory scientists and other data contributors are concerned with species protection and ethical data use (Ganzevoort et al. 2017), it is important to publicly document data restrictions to support informed participation (Bowser and Wiggins 2015; Thuermer et al. 2023).

Herein, we describe how 43 biodiversity databases that restrict access to participatory science data explain their data restriction rationales. We contribute to the conversation about data restrictions by exploring how databases currently justify restrictions. We describe variation in restriction rationale use among databases, contextualize rationales in light of communications and information quality theories, and suggest different approaches that databases could consider using to expand their explanations.

Background

Communicating data practices is a foundational principle of responsible data stewardship. Data management documentation and data policy language can reveal how data are collected, stored, processed, shared, and preserved throughout the data lifecycle (the steps from data collection to publication or archiving) (Bowser et al. 2020; Thuermer. et al. 2023). Open science stresses that scientific knowledge ought to be accessible to all, and data are increasingly expected to be FAIR (findable, accessible, interoperable, and reusable) owing to research policies and community norms (Wilkinson et al. 2016). Data documentation and policy language are key elements of making data FAIR, even if those data are access restricted.

Within open, participatory science settings, well-developed data policies provide clear benefits. Policies support informed decision-making, including by enabling contributors to assess whether a project’s values align with their own before contributing (Bowser and Wiggins 2015, p. 34). This is critical in biodiversity projects, where data sharing has the potential to increase risks to species. Policies may also have educational properties. For example, policies may improve participants’ ability to analyze or reuse project data or inform participants about threats to species. However, research has identified a “work-in-progress narrative” about data management documentation in participatory science, finding that data management plans, data policies, and dataset licenses are often absent or incomplete (Bowser et al. 2020, p. 10; Bowser and Wiggins 2015; Groom, Weatherdon, and Geijzendorffer 2017; Roman et al. 2021; Thuermer et al. 2023; Suter, Barrett, and Welden 2023). This project focuses on documentation in which databases justify restrictions placed on data access.

There are legitimate reasons to withhold or restrict access to participatory science datasets. Many biodiversity projects restrict sensitive species data, such as exact geolocations of rare nesting birds or commercially valuable plants. Biodiversity databases are readily accessible through the Internet and frequently publish data immediately. As such, their data are prone to misuse. Data distribution can facilitate the exploitation, disturbance, or abuse of scarce, charismatic, valuable, or persecuted species, and contribute to population decline, extirpation, or extinction (Lindenmayer and Scheele 2017). Moreover, individuals seeking to view or collect sensitive species may degrade fragile habitats or trespass on private property. These harms are well-documented. For example, publishing reptile and orchid locations has led to collection and extirpation (Stuart et al. 2006; Averyanov et al. 2014; Auliya et al. 2016). To mitigate harm, journals may not require locations for species descriptions if there is a risk of exploitation (Yang and Chan 2015). In addition, some journals and governments offer exceptions to mandatory data disclosure for endangered species (Lindenmayer and Scheele 2017).

In such instances, data managers weigh the benefits of openness against the potential for harm and may elect to restrict public access to data about species they designate as “sensitive” (Chapman 2020). In our analyses, we compare databases in terms of their use of different types of lists: conservation lists, protected species lists, or sensitive species lists. For full operational definitions of these list types, see Supplemental File 6: Appendix F. Briefly, conservation lists depict species and their conservation statuses (descriptions of extirpation or extinction risk, e.g., Endangered, Least Concern) in a region. Protected species lists outline species and their legal protections in a region. Conservation lists and protected species lists often overlap. Sensitive species lists identify species for which data access should be restricted to mitigate harm. The creation of sensitive species lists requires a species-by-species analysis of threats related to data sharing and the restrictions needed to mitigate threats. Relatively few sensitive species lists are publicly available.1

Databases take multiple approaches to designating species as sensitive for data restriction purposes, two of which relate to list usage. First, some databases import existing conservation lists or protected species lists and restrict access to data of all endangered or protected species on those lists; however, GBIF suggests this is not best practice (Chapman 2020, p. 37). A second approach, which sensitive species data management guidelines recommend (Chapman 2020), is to apply or create sensitive species lists to keep data restrictions as narrow as possible. A third approach, which does not involve lists, is to mask data about all species covered by the database for public display. This is a popular approach for reptile and amphibian databases, given a history of exploitation following data sharing in that community, and camera trap projects, which can raise concerns for both species protection and human privacy (Anhalt-Depies et al. 2019).

Some sensitive species data management guidelines prioritize openness and recommend restricting data access only if there is a clear threat to the survival of a taxon that is not sufficiently mitigated by other measures (e.g., physical access restrictions), or if there is likely to be a severe impact on the environment or humans (Tulloch et al. 2018; Chapman 2020). These guidelines argue that the benefits of information disclosure, such as preventing accidental damage through development or recreation, outweigh the risks for many endangered species. Other guidelines emphasize protection, and suggest that data restrictions are increasingly necessary given a lack of effective legal or physical protections for at-risk species, and because digital technologies heighten information access and exploitation and disturbance of species (Lindenmayer and Scheele 2017). Government-administered biodiversity databases are sometimes required to disclose data to the public with limited exceptions (see, e.g., Miljødirektoratet 2016). Nongovernmental databases may have more flexibility to release or restrict data.

Our project adds to this discussion and describes how biodiversity databases communicate their rationales, or justifications, for restricting data. Some have argued that restrictions should be explained to ensure that decisions to restrict are justifiable (Atlas of Living Australia 2023). Working from the FAIR Guiding Principles of supporting accessibility and reuse, some have argued that rationales for data restrictions should be communicated to the public, and that they should be specific enough to support further action by an interested user (“… be as specific as possible” in order to enable a potential users to “understand the reasons why and plan next steps in terms of either attempting to access the data, accepting the data at the scale provided, or not proceeding,” TDWG 2025a). This paper explores how much justification of data restrictions databases choose to provide. Further, we consider what makes a good justification, both in terms of the FAIR Principles, communications and information quality theories, and the day-to-day functioning of biodiversity databases. While some databases may be subject to specific communications requirements, many organizations may have flexibility to decide (within the bounds of resources) whether and how to communicate data restrictions they choose to implement.

In addition, Darwin Core, the metadata standard widely used by biodiversity databases, currently lacks a structured vocabulary for data restriction rationales. Databases currently use several open-ended metadata fields to express reasons for restrictions (Chapman 2020; Astorga, Rodrigues, and Waller 2024, pp. 10–12, 22–23). As a result, database metadata tends to cite different rationales at different levels of abstraction. The Biodiversity Information Standards (TDWG) Sensitive Species Extension Task Group (henceforth referred to as TDWG SSETG) is currently working to generate a vocabulary for restriction rationales (TDWG 2025c). Until the standard is complete and widely adopted, there is not a standard way to express restrictions. Herein, we contribute to this conversation by giving recommendations for term development informed by database practices.

In summary, communication about data restrictions is especially important for biodiversity databases sharing participatory science data because contributors should be able to make informed decisions about their involvement in projects. But, decisions about whether to restrict access to sensitive species data, and decisions about how to describe or justify those restrictions, require balancing ideals such as those of open science, biodiversity conservation, and the legal and ethical obligations of researchers.

Methods

This study of data restriction rationale communication practices included two types of web-accessible, public biodiversity databases: participatory science platforms (e.g., eBird) and data aggregators (e.g., Atlas of Living Australia). While our analysis treats participatory science platforms and aggregators as distinct categories, they can blur. In cases where aggregators collected data directly from participatory scientists, we still classified them as aggregators (see Supplemental File 1: Appendix A). We generated our list of databases (30 participatory science platforms and 13 data aggregators, from 20 countries) from a larger list of 107 databases that share biodiversity participatory science data generated for a related research project. We then narrowed this list to those databases that systematically restricted access to sensitive species data (see Supplemental File 1: Appendix A and Supplemental File 2: Appendix B). To qualify, a database had to state in public, textual documentation that it automatically generalizes or withholds data because of species sensitivity concerns. To identify documentation, for each database, we reviewed links in website menus or footers, searched sites for terms related to sensitive species through website search bars and the Google site search operator, and searched Google Scholar for publications from database teams. Furthermore, we queried databases for observations of probable sensitive species based on lists of endangered or exploited species.

We then created a collection of documents representing the databases and stored them in a shared drive. If documentation was in a language other than English, we created a separate document containing the output of machine translation via Google Translate. We excluded documents that provided repetitive information to simplify coding. This resulted in a sample of 137 documents of varying lengths (from single sentences to hundreds of pages). We limited our scope to text describing rationales for data restrictions (or lack thereof) that automatically applied to all observations of designated sensitive species (in certain regions during specified periods). We did not code text describing rationales for discretionary restrictions applied at the dataset or individual record level, such as research embargoes or optional contributor privacy protections.

First, we analyzed each database for demographic variables including database size, type, taxonomic scope, policy formality, and rationale description level. See Supplemental File 3: Appendix C for a list and definitions of variables that we report in our results. To explore the rationales used for justifying data restriction, and to compare them with discussions in the literature, we inductively developed thematic justification codes using a smaller subset of our sample (nine databases). We then deductively coded the rationales in all 43 database policies. See Supplemental File 1: Appendix A for coding details including intercoder reliability reports.

Results

Our findings about rationales, or justifications for data restrictions, are similar to those of previous studies, confirming their range of concerns about potential negative impacts of sensitive species data sharing (Chapman 2006; Astorga, Rodrigues, and Waller 2024). This suggests that we may have achieved a shared understanding of why people restrict access to sensitive species data. Herein, we focus on the practices of communicating data restriction rationales. We first distinguish between general and specific theme rationales and describe the observed variation in how databases use them. We then describe to what extent databases use references to authority lists to justify restrictions. We describe differences in policy documentation level and formality, and we compare these elements by demographic subgroup to explore what types of databases tend to provide more (or less) justification of data restrictions through their inclusion of different rationales on web-accessible materials.

General theme and specific theme rationales for data restrictions

Table 1 shows the frequency of use of each of the 31 rationales for data restrictions across the 43 databases. In compiling this frequency data, we distinguished between general theme rationales and specific theme rationales. This distinction is based on the meaning of the text provided as justification of a data restriction. General theme rationales are justifications that plausibly extend to many restricted-access species. Specific theme rationales are justifications that likely apply to fewer species or instances. We found that a small number of general theme rationales were commonly used. Five general theme rationales shown in Table 1 were used by over 50% of our databases, including “location sharing increasing threats” (74%), “species protection” (67%), “exploitation” (63%), “abuse or disturbance” (58%), and “life stage or breeding” (53%). Another example of a general theme rationale is justifying data restriction in terms of an existing conservation list or protected species list (42%). Some general theme rationales were not commonly used. For example, “ecological significance” (2%) could apply to many species.

Table 1

Frequency of rationale use across the 43 databases.

RATIONALEFREQUENCYPERCENTAGE
1. Location sharing increasing threats320.74
2. Species protection290.67
3. Exploitation270.63
4. Abuse or disturbance250.58
5. Life stage or breeding230.53
6. Wildlife or environmental crime200.47
7. Conservation status (without specifying a list)190.44
8. Habitat or ecosystem protection190.44
9. Population size or stability, regeneration potential, rarity180.42
10. Conservation list or protected species list: state, regional, or national180.42
11. Sensitive species list: non-governmental in-house130.30
12. Attractiveness or interest to humans120.28
13. Sensitive species list: state or national110.26
14. Persecution110.26
15. Nativity, introduction or reintroduction, problematic or invasive species100.23
16. Association with sensitive species100.23
17. Conservation list or protected species list: global90.21
18. Physical ease of locating, detecting, or accessing90.21
19. Dormancy, roosting, shelters, or site use80.19
20. Range expansion or contraction80.19
21. Distinctiveness, taxonomic status, or uncertainty about species80.19
22. Harms to humans70.16
23. Extirpation or extinction70.16
24. Ease of capture or collection70.16
25. Disease, pathogen transfer70.16
26. Restricted range or endemism60.14
27. Documented harm60.14
28. Environmental protection60.14
29. Sensitive species list: non-governmental external20.05
30. Ecological significance of species10.02
31. Individual animal welfare00

Many of the specific theme rationales in Table 1 were used by a small percentage of databases. For example, “extirpation or extinction” (whether a species was extirpated, extinct in the wild, or presumed extinct) (16%) would apply to a narrower range of species. In another example, the low-use rationale of “documented harm” (14%) reflected a particular data policy employed by a few databases that required evidence of documented harm to a species before restricting data.

For definitions and examples of the rationales in Table 1, see Kaehrle and Eschenfelder (2025) and Supplemental File 4: Appendix D.

Authority lists to justify restrictions

We found that databases commonly justified data restrictions with reference to an authority list: either a conservation list, protected species list, or sensitive species list. As seen in Table 1, 42% of databases referred to a state or national conservation list or protected species list, and 21% referred to a global conservation list or protected species list (codes 10, 17). Recall that sensitive species lists are lists of species for whom data sharing ought to be restricted. 26% of databases explained data restrictions in terms of sensitive species lists created by a state or national government body (code 13). 5% explained restrictions in terms of sensitive species lists created by external non-governmental organizations (code 29). Some nongovernmental databases created their own internal sensitive species lists and referred to their own classifications in their rationales (30%; code 11).

Policy documentation describing restrictions: formality

We examined each database to see whether restriction rationales were presented to a reader in a formally named policy document or document section, or if restriction rationales appeared more informally, mixed in with other user-facing text and without any identifying titles or headers. As shown in Table 2, most databases included restriction rationales in easily identifiable formal policies. However, 11 databases included only informal policy text, and two of these databases provided information about restrictions only in external articles that were not included in the database itself.

Table 2

Formality of policy related to data restrictions.

FORMAL POLICY (Y/N)NUMBER OF DATABASESPERCENTAGE OF DATABASES
Formal policy yes3274%
Formal policy no (informal only)1126%

Furthermore, some restriction policy texts were easily accessible from website menus or footers (e.g., from a broader data landing page), while others required searches, or extensive exploration, to find. Many databases offered several different text sources justifying data restrictions, spanning both formal and informal policies. If a database offered at least one formal policy text, we classified it as having a formal policy.

Database- and species-level rationales for data restrictions

We distinguished between two approaches to rationale documentation, which we call database-level and species-level rationales, which vary in terms of the placement of rationales within a database’s information architecture. Database-level rationales appear higher in the information architecture and are presented as possible explanations for restrictions decisions for many species. In contrast, species-level rationales appear lower in the information architecture and are presented as explanations for single restricted-access species. Databases using species-level rationales presented rationales specific to every restricted-access species. As shown in Table 3, 86% of our databases provided only database-level rationales. Only 6 (14%) provided species-level rationales. All databases that provided species-level rationales also offered database-level rationales. In the Discussion, we examine the interaction between general and specific theme rationales, and database- and species-level rationales.

Table 3

Level of documentation provided for data restrictions.

DOCUMENTATION LEVELNUMBER OF DATABASESPERCENTAGE OF DATABASES
Database-level rationales only3786%
Species-level rationales614%

Of the six databases that provided species-level rationales, five provided spreadsheets with rows for species and columns for restriction rationales. One provided a searchable database of species sensitivity assessments, including restriction rationales. We found that three of the six databases that provided species-level rationales also published rationales for not including data restrictions for species designated “not sensitive.”

How much justification is enough? Subgroup variation

How much explaining is necessary? We looked at the volume of explanation provided, operationalized as the number of different rationales used by each database. The highest number of rationales provided by a database was 27 (out of a maximum of 31). The mean number of rationales provided was 9. Out of our 43 databases, 9 (21%) provided 0–2 rationales for data restrictions (see Figure 1).

cstp-10-1-899-g1.png
Figure 1

Rationales cited by database size.

We were interested in exploring whether databases with certain demographic characteristics might provide more rationales than others, so we examined rationale use among subgroups based on database size and other demographic features. Table 4 and Figure 1 present these results. As shown in Figure 1, while larger databases generally cited more rationales, several large or very large databases included relatively few rationales. Further, two smaller databases (17 and 20) provided a high number of rationales. We generally found lower rationale use among smaller participatory science platforms (as opposed to aggregators) run by nonprofits or hybrid/other organizations, with limited taxonomic scopes.

Table 4

Variation in mean rationales cited and database size by subgroup.

SUBGROUPMEAN RATIONALES CITEDMEAN DATABASE SIZE BY RANK ORDER (1 SMALLEST, 43 LARGEST)
Database type
Biodiversity data aggregator (n = 13)13.8529
Participatory science platform (n = 30)6.918.97
Host institution type
Government science agency (n = 7)14.4325.29
Nonprofit (n = 19)8.2122.74
Hybrid or other (n = 17)7.6519.82
Taxonomic scope
All taxa (n = 24)10.8326.54
Birds (n = 8)9.2524.62
Reptiles and amphibians (n = 4)8.2510.25
Flora (n = 2)3.516
Arthropods (n = 4)37.75
Policy type
Formal policy (n = 32)11.2524.84
Informal policy (n = 11)2.4513.73
Documentation level
Species-level rationales (n = 6)22.6734.67
Database-level rationales (n = 37)6.7819.95
List creation
Created internal sensitive species list (n = 21)14.4830.9
No evidence of list creation (n = 22)3.7713.5

Due to the high skew in size measures based on number of records, we use mean rank, a measure of average database size by subgroup, following rank order of 1–43.2

We suspected that databases that offered formal policy documentation might provide more rationales. Data show that the mean number of rationales provided by formal policy databases was 11.25 (n = 32), and the mean rationales for databases providing informal policies was 2.45 (n = 11). It is worth noting that formal policy texts did not always contain the most rationales. Recall that many databases included both formal and informal policy texts. In several cases, databases’ informal policy texts included more rationales than their formal policy texts.

We expected that the mean number of rationales cited might vary based on whether a database had created its own sensitive species list because the process of making a list requires reflection on criteria. As we expected, we found that databases that created their own sensitive species lists (n = 21) used a mean of 14.48 rationales, compared with 3.77 rationales for databases that solely referenced existing lists (n = 22).

We expected that databases providing species-level rationales would provide more rationales because discussing each species individually requires more explanation. As expected, databases that offered species-level rationales (n = 6) cited a higher mean of 22.67 rationales compared with databases that provided only database-level rationales (n = 37), which cited a mean of 6.78 rationales.

We also suspected that databases that provided only database-level rationales may use general theme rationales. If one were to explain rationales once, at the database level, it would make sense to use justifications that likely apply to many species. We found that this was the case. Database-only documentation databases were more likely to use general theme rationales, while species-level rationale databases frequently cited specific theme rationales (see Supplemental File 5: Appendix E for figures).

Discussion

In this section we discuss factors any database might consider as it decides how much justification to provide for its data restrictions. We explore what makes a good rationale, considered in terms of information quality and communications theories, the FAIR Principles, and the day-to-day functioning of biodiversity databases. Consideration of rationale communication practices is relevant for all projects including those with little or no rationale documentation as well as those with extensive documentation that might seek additional ideas.

What makes a good rationale?

In assessing what makes a good rationale, we should consider both message-based and user-based criteria. Message-based criteria could include the number of rationales, their length, their cogency, the ease of obtaining the information, how well the information is presented, and its completeness, accuracy, timeliness (Petty, Brinhol, and Priester 2002; Stvilia and Twidale 2008). User-based approaches assess quality within the context of the information’s relevance to the user’s task or situation (Stvilia and Twidale 2008). Differences in individual characteristics of the perceiver also impact the success of a message (Petty, Brinhol, and Priester 2002). Different stakeholders may have different experiences; for example, some databases may provide extensive and easy-to-find rationales to credentialed data users such as conservation partners, but not to the general public. We continue by considering how biodiversity database stakeholders’ perception of the goodness of a data restriction rationale may vary.

Recall that we have roughly distinguished between general theme rationales, which can apply to many species, and specific theme rationales, which apply to a narrower range of species. Importantly, there is a distinction between our rationale categories and the actual text instances on database websites from which we derived those categories. Website text indicated by those categories differed in the amount of context provided. We distinguished between text instances providing lower context and instances providing higher context, which we refer to as low- and high-context rationales. For example, text we tagged with the general theme rationale of “Exploitation” might include little additional context. Two examples from our data include instances where the justification consisted of one verb, “collection,” or two words, “excavation risk.” Text can, however, include significant context. Examples from our data included descriptions of ongoing exploitation that drew on published studies, unpublished statistics, or anecdotes of past damages. The amount of context provided may influence readers’ perceptions of the quality or persuasiveness of the rationales.

From a potential data requester perspective, arguably a general theme rationale such as “Species protection” (justifying data restriction in terms of the protection of species) provides limited information about the specific features of the threat(s) to the species. As a result, it provides little guidance for users seeking to develop informed data access requests. Recall that some have asserted that rationales for restrictions should be “as specific as possible (TDWG 2025a). Arguably, specific theme rationales and high-context rationales assist users requesting data access and promote accountability by allowing users to verify that databases apply restrictions according to their stated criteria.

However, justifying data restrictions using general theme rationales or low-context rationales may have practical advantages for other stakeholders and situations, and our data show that general theme rationales were more frequently used than specific theme rationales. For resource-constrained databases, general theme rationales can be applied once at the database level to explain data restriction for many species. General theme rationales still signal to external stakeholders that data exist that they can request. Further, data stewards may choose general theme rationales or low-context rationales to reduce social, political, or informational risks. Use of these rationales may avoid difficult conversations with external stakeholders about activities related to the species where there is no consensus. Providing specific theme rationales or high-context rationales may open room for dispute and counterclaims. In some instances, providing specific theme rationales or high-context rationales may increase threats to species by enabling inferences about locations or the re-identification of sensitive records (TDWG 2025b).

We noticed that some databases that stated that they reviewed species for data restrictions individually, and developed internal sensitive species lists, opted not to share species-level rationales publicly. These decisions may be aimed at mitigating harm to species or retaining strategic ambiguity.

Resource constraints likely drive decisions about how to justify data restrictions, and most databases in our sample (86%) provided rationales at the database level. Communicating rationales at the species level is resource-intensive because it requires formulating justifications specific to each species. Some databases may be obliged by legislation or funding requirements to communicate species-level rationales for restrictions, but many have more leeway. Another less resource-intensive method of communicating rationales we observed was reference to conservation lists or protected species lists. We found that many databases referred to such lists in explaining restriction decisions (rationale codes 10 [42%] and 17 [21%]).

In sum, assessments of rationale quality should consider several factors, and must acknowledge that perceptions of quality may differ by the individual, role, or setting. Inclusion of specific theme, high-context, or species-level rationales aligns with open science ideals. Provision of general theme, low-context, or database-level rationales may help to minimize species risk, maintain strategic ambiguity, or navigate resource constraints. Databases may provide different rationales to conservation partners than to the public.

While we coded only rationale use of databases that restricted data access, the question of communicating rationales also applies to databases that do not restrict access to any species data. We posit that databases that do not restrict also vary in the degree to which they justify their lack of restrictions.

We recommend deeper consideration of data restriction rationale communication practices and acknowledgement of variation in project values (e.g., risk aversion), obligations, and resources (Cooper, Rasmussen, and Jones 2022, p. 12). We list recommendations below: three that are specific to restriction rationales and four more general communication recommendations.

Recommendations for data restriction communication practices

Following from the FAIR Principles and theoretical perspectives above, but acknowledging data stewards’ needs for flexibility, we have the following recommendations for data restriction communication practices:

  • 1. Databases should provide rationales for data restrictions. While we only included databases that referred to data restrictions in our sample, we know that some databases neither describe their restrictions nor provide rationales, and thus these databases could improve their practices by adding rationales. Decisions about whether to use general or specific theme rationales or high- or low-context rationales will depend on project values, obligations, and resources. Decisions about the number of rationales or length or completeness of rationale documentation may influence perceptions of the goodness of the rationale. Further, perceptions of goodness may vary based on the stakeholder and their information needs or relationship with the information.

  • 2. As a best practice, databases should provide species-level rationales, if they have sufficient resources to do so and if doing so does not increase threats to species. Species-level rationales educate users about threats facing species, help users understand how the database weighs benefits and costs of restricting data access, and may help users develop more effective petitions for both legitimate data access requests and the implementation of additional data restrictions.

  • 3. Databases that collect or aggregate participatory science data should ensure their restriction rationales and policy texts are easy to find. For databases that provide more detailed restriction rationales to credentialed users, they could review these rationales to determine whether it would be appropriate to provide some to the public, if doing so does not increase threats. Databases should describe data restrictions through their websites, rather than solely external publications such as journal articles. Databases should provide well-labeled restriction policies through intuitive locations such as broader data policy landing pages in website headers or footers. One example of accessible policy documentation is that of NDFF, a national data aggregator in the Netherlands (https://ndff.nl/). It describes data restrictions through several pages tied to its header and footer, as well as dedicated PDF documents and spreadsheets.

While we did not report data on these attributes, we also suggest:

  • 4. Databases should describe any technical measures applied to protect data (e.g., location generalization). This transparency helps contributors make informed decisions about submitting data and enables data users to assess fitness for use and determine whether to request higher-resolution access.

  • 5. Databases should describe procedures for requesting access to restricted data.

  • 6. Databases should offer tools, such as lists or search filters, to help users identify restricted-access species and (potentially, depending on the technical measures applied to protect data) metadata records for those species.

  • 7. Databases should report the dates of policy revisions and data restriction decisions.

Taken together, our findings show that practices for communicating data restriction rationales vary widely and do not meet current best practice guidelines that assume species-level rationales (Chapman 2020). We are concerned that some projects may become discouraged by their inability to meet this ideal, and may delay implementing less ideal, but still effective, data restriction communication approaches, such as database-level, general theme, or low-context rationales.

Metadata standards efforts

Metadata is another tool through which projects may choose to communicate data restriction rationales. Recall that Darwin Core currently lacks a structured vocabulary for data restriction rationales, and practitioners are working to create consensus terms. Field experts suggest both ambitious and more conservative approaches for conveying restriction rationales in metadata. In one ambitious approach, GBIF’s best practice guidelines recommend documenting the type of harm, the risk to the species of the harm occurring, and the likelihood that data sharing would contribute to harm (Chapman 2020). Other frameworks are more conservative, for example the National Framework for the Sharing of Restricted Access Species Data in Australia does not state requirements for restriction rationales, referring to “sensitivity reasons” more broadly (Atlas of Living Australia 2023, pp. 59–61). TDWG SSETG’s current draft restriction rationales include a mixture of general and specific theme rationales, but the standards process is ongoing (TDWG 2025a).

Based on the variation in rationale use that we documented among databases, we suggest that controlled vocabularies include a mixture of general and specific theme rationales for “type of harm.” Inclusion of general theme rationales in controlled vocabularies will provide discretionary space for data stewards and give databases a way to accomplish communication goals using fewer resources. Inclusion of specific theme rationales will increase transparency and allow for better assessment of reasons for data restrictions. While we conducted this research independently of the TDWG standardization effort, MK provided suggestions for the work-in-progress restriction rationale vocabulary.

Limitations

The results of this study are subject to several limitations. First, there is a possibility that we may have missed some relevant documentation. Second, a reliance on machine translation for foreign-language sources led to some errors in interpretation which took additional language checking to correct. However, the consistency of translated terminology across documents from individual databases gave us confidence that Google Translate generally performed well. Moreover, our use of thematic-style analysis made the use of particular words less important than the expression of broader ideas, which we are confident that we were able to ascertain despite differences in term use by database and language (for additional discussion, see Supplemental File 1: Appendix 1: Methods). Third, our codebook does not include all rationales for biodiversity data restrictions. Some that we encountered, such as Indigenous data sovereignty and data provider license terms, fell outside our focus on automatic, species-based data restrictions. Lastly, because our study is limited to publicly available written documentation, it does not fully capture databases’ motivations for restricting access to sensitive species data, or their communications of their restriction decisions. Further research will include interviews with database staff to better understand decision-making about data restrictions and rationale communication.

Conclusion

Sensitive species data stewardship balances data findability and accessibility with competing values such as species protection and habitat and ecosystem protection. Current community best practices suggest providing documentation of data restrictions, to ensure transparency and enable users to develop informed data access requests (Chapman 2020; Atlas of Living Australia 2023; TDWG 2025a). This study found wide variation in the rationale communication practices of a global set of 43 biodiversity databases that share participatory science data. Some offered expansive justifications for data restrictions, while others provided little explanation. In some cases, rationales were easy to locate; in others, they were difficult to find. We distinguished between the following types of rationales. General theme rationales provide explanation for many species restriction decisions, while specific theme rationales likely apply to fewer decisions. High-context rationales provide significant detail, while low-context rationales provide less detail. Database-level rationales appear high in a database’s information architecture and are presented as possible explanations for restriction decisions for many species. In contrast, species-level rationales appear lower in the information architecture and are presented as explanations for single restricted-access species. Finally, we distinguished between formal restriction policies designated by titles or headers about data restrictions, and informal restriction policies, which lack titles or headers or appear external to a database.

In our Discussion section, we explain how different decisions about rationale use impact accountability, resource demands, species protection, and potentially relationships with key stakeholders. Drawing on the FAIR Principles and communications and information quality theories, we then suggest seven best practices for data restriction communication that account for differences in project values, obligations, and resources.

Our findings advance conversations about participatory science data stewardship and data documentation standards by describing variations in biodiversity data restriction communication practices. Recognizing that many databases may have flexibility to decide whether and how they communicate restrictions, this work aims to support projects navigating communications decisions.

Data Accessibility Statement

Supplemental File 6 contains the results of our deductive coding. For this article, we present results by database category (e.g., aggregators) rather than by database. Additional data are available upon request; please contact the corresponding author, Martin Kaehrle, at kaehrle@wisc.edu.

Supplementary Files

The Supplementary files for this article can be found as follows:

Supplemental File 1

Appendix A. Methods. DOI: https://doi.org/10.5334/cstp.899.s1

Supplemental File 2

Appendix B. Sample. DOI: https://doi.org/10.5334/cstp.899.s2

Supplemental File 3

Appendix C. Variable definitions. DOI: https://doi.org/10.5334/cstp.899.s3

Supplemental File 4

Appendix D. Codebook. DOI: https://doi.org/10.5334/cstp.899.s4

Supplemental File 5

Appendix E. Rationale use differences by documentation level. DOI: https://doi.org/10.5334/cstp.899.s5

Supplemental File 6

Appendix F. List definitions. DOI: https://doi.org/10.5334/cstp.899.s6

Supplemental File 7

Notes

[1] See, for example, Column C, Sensitive species list: non-governmental in-house, in Supplemental file 7: Coding data. Of the 13 nongovernmental databases that justified data restrictions on the basis of having created sensitive species lists, only five made those lists publicly available. Astorga, Rodrigues, and Waller (2024) also reported relatively few publicly available sensitive species lists.

[2] Considering size in terms of the number of species occurrence records, our sample is highly skewed with a range of 4,022 – 1.9 billion records and a median of 6,002,463. For this reason, we ranked databases from 1 (smallest) to 43 (largest) and used five categories of size from “tiny” to “very large” (see Figure 1). We also report on mean rank, a measure of average database size by subgroup, following rank order of 1–43.

Competing Interests

The authors have no competing interests to declare.

Author Contributions

CRediT (Contributor Roles Taxonomy): First author: conceptualization, data curation, formal analysis, investigation, methodology, project administration, visualization, writing – original draft, writing – review and editing. Second author: writing – original draft. Third author: conceptualization, formal analysis, investigation, methodology, supervision, writing – original draft, writing – review and editing.

DOI: https://doi.org/10.5334/cstp.899 | Journal eISSN: 2057-4991
Language: English
Submitted on: Aug 10, 2025
|
Accepted on: Nov 18, 2025
|
Published on: Dec 22, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Martin Kaehrle, Corey Jackson, Kristin Eschenfelder, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.