Skip to main content
Have a personal or library account? Click to login
On the Darwin Core Term dwc:habitat, and the Need to Adopt a European Vocabulary Based on NATURA2000 and EUNIS Classifications, with a Comment on International Applicability Cover

On the Darwin Core Term dwc:habitat, and the Need to Adopt a European Vocabulary Based on NATURA2000 and EUNIS Classifications, with a Comment on International Applicability

Open Access
|May 2026

Full Article

Introduction

Sharing and integrating biodiversity data among different disciplines requires common encoding, and this represents a complex challenge for the global scientific community. Ecological and biological data are intrinsically heterogeneous, gathered using a wide array of methodologies and in various formats (Zimmerman, 2008; Michener, 2015). This variability makes large-scale aggregation and comparative analysis extremely difficult, hindering the ability of researchers to understand global biodiversity patterns and provide effective information for natural resource conservation decisions.

In this scenario, the Darwin Core (DwC) standard (https://dwc.tdwg.org) has emerged as the most widely used method for managing and exchanging biodiversity information. Maintained by Biodiversity Information Standards (TDWG), DwC provides a common language and a glossary of terms with identifiers, labels, and reference definitions, whose main purpose is to facilitate the sharing of biological diversity information worldwide (Wieczorek et al., 2012; Baskauf and Sachs, 2018). Its adoption has greatly simplified the data publishing process and allowed data to adhere to the principles of findability, accessibility, interoperability, and reusability (FAIR principles, https://force11.org/info/the-fair-data-principles/) (Wilkinson et al., 2016). Globally relevant platforms like the Global Biodiversity Information Facility (GBIF) (www.gbif.org) and the Ocean Biogeographic Information System (OBIS) (https://obis.org/) use DwC extensively for publishing and synthesizing biodiversity data, attesting to its effectiveness and strategic importance (De Pooter et al., 2017). DwC’s lasting value comes from its simple, open format and clear term definitions, independent of specific technologies (Svenningsen and Schigel, 2024).

Although Darwin Core is widely adopted and offers many strengths, it has certain gaps that can limit the detail and semantic interoperability of specific fields (Franz and Sterner, 2017). One of these critical issues emerges with the term dwc:habitat, defined as a ‘category or description of the habitat in which a dwc:Event occurred’ (see https://dwc.tdwg.org/list/#dwc_habitat). Its current implementation allows for the entry of any literal string value, which cannot capture the complex ecological heterogeneity of, e.g., pre-cordilleran steppe or beech forest ecosystems. Unlike other DwC terms, such as dwc:relationshipOfResourceID (explicitly linked to the Open Biological Ontologies (OBO) Relations Ontology (RO), https://www.ebi.ac.uk/ols4/ontologies/ro (Smith et al., 2007)), dwc:habitat lacks a controlled reference vocabulary. This discrepancy between flexibility in accepting free-text input for dwc:habitat and the general goal of DwC standardization and interoperability highlights a potential flaw in the design of standards for complex domains like biodiversity.

The purpose of this document is i) to propose and discuss the adoption of consolidated European habitat classifications, specifically those of the NATURA2000 network and the European Nature Information System (EUNIS) classification, as reference vocabularies for the term dwc:habitat within the European Union regions, and ii) to briefly comment on the dwc:habitat issue at the international level.

The Darwin Core Standard and the Need for Controlled Vocabularies

Darwin Core (DwC) is a fundamental data standard for biodiversity, using a common language and a set of precisely defined terms. These terms, which in other contexts may be considered properties, elements, fields, or attributes, are accompanied by Uniform Resource Identifiers (URI), labels, and reference definitions (Darwin Core Task Group, 2009). The operational structure of DwC is based on the Darwin Core Archive (DwC-A), a standardized file format consisting of a compressed package (a ZIP file) containing interconnected text files. Using controlled vocabularies and clear term definitions is essential for consistent data interpretation and sharing (Nitta and Iwasaki, 2024). Without them, data risk becoming isolated and unusable for large-scale analysis (Reichman, Jones, and Schildhauer, 2011). Ontologies, as structured forms of controlled vocabularies, significantly increase the interoperability and reusability of datasets, making data interpretable by both humans and machines (Golbreich et al., 2007; Mungall et al., 2010). Such standardization through DwC not only simplifies the data publishing process but also makes it easier for users to discover, search, evaluate, and compare datasets (Svenningsen and Schigel, 2024).

A key advantage of linking DwC terms to external ontologies is the ability to express data in the Resource Description Framework (RDF) format, a crucial step for allowing the biodiversity informatics community to participate in broader Linked Data and Semantic Web initiatives. This overcomes the limitations of text-string-based formats, promoting more normalized and interconnected data representation (Baskauf et al., 2016).

The success and effectiveness of using RO (Smith et al., 2005) demonstrate that Darwin Core is intrinsically capable of integrating external controlled vocabularies to improve semantic precision and interoperability. The challenge, therefore, is not technical in nature, but rather related to the adoption of and reaching a consensus on the use of specific vocabularies for habitats.

The Ambiguity of the dwc:habitat Term in Darwin Core

It is important to note that, although an equivalent term (dwciri:habitat) exists in the dwciri: namespace, intended for use in RDF with non-literal objects (IRI), the dwc:habitat term, which accepts literal strings, remains widely used in practice. Darwin Core was originally conceived to meet the needs of species data exchange, and the dwc:habitat field is recognized as one of the least standardized within the standard itself (Jomier, Poncet, and Michez, 2019). This situation creates significant data heterogeneity, compromising its usefulness for comparative and aggregated analyses.

Although Fagus sylvatica dominates beech forests across Europe, these forests belong to different plant associations, reflecting the species’ adaptability to varied climates and soils without occupying a uniform habitat (Knapp, 2011; Houston, de Rigo, and Caudullo, 2016). Instead, it forms different plant associations depending on specific abiotic and biotic factors (Packham, Hobson, and Norris, 2013). A generic description like ‘beech forest’ fails to capture this ecological complexity and the diversity of plant associations (Pizzolotto, 2022). In official habitat classifications, such as Annex I of the NATURA2000 Habitat Directive, different types of beech forest are clearly distinguished, e.g., within the group ‘9100: Forests of temperate Europe’, each with precise ecological characteristics:

  • 9110: Hornbeam forest (Luzulo-Fagetum)

  • 9120: Atlantic acidic beech forest with undergrowth of holly and occasionally yew (Quercion robori-petraeae oder Ilici-Fagenion)

  • 9130: Woodruff-beech forest (Asperulo-Fagetum)

  • 9140: Central European subalpine beech forest with maple and Rumex arifolius

  • 9150: Central European orchid-calcareous beech forest (Cephalanthero-Fagion)

It is therefore evident that the simple label ‘beech forest’ lacks detail and specificity, making it difficult to distinguish between ecologically different habitats that support different biological communities. This problem is particularly acute when attempting to conduct large-scale analyses, such as species distribution modelling, conservation status assessment, or the identification of ecological trends at a continental or global level (Villero et al., 2017; Costa et al., 2018). The lack of standardization in this field is recognized as one of the main challenges for biodiversity data interoperability at a global level (Jomier, Poncet, and Michez, 2019).

The problem with dwc:habitat begins when querying the GBIF database, where it is impossible to select specific records containing the term dwc:habitat, precisely because dwc:habitat is not an ‘indexed’ Darwin Core field in the GBIF database. GBIF only allow server-side filtering on fields they have standardized and indexed (like coordinates, taxon keys, or basis of record).

The ‘habitat’ field is notoriously messy in biodiversity data. Because it is free text, one person might write ‘Forest’ while another writes ‘Oak woodland’. At the time of writing, a direct query in GBIF to select datasets with ‘habitat’ in the title gave 319 datasets, which likely accounted for millions of records, without the possibility to filter them on the basis of the dwc:habitat content before downloading them.

For example, among the 20 datasets dealing with marine habitats (identified from the dataset title), 12 were based in Europe, and only those eight published by the ‘Aqua kompetanse’ publisher (https://www.gbif.org/publisher/56d7759a-a2cd-4585-a5f3-7fbbca931eaf) contained a non-null dwc:habitat field with information, written in Norwegian, such as ‘solid rock and large blocks’.

Moreover, given the non-findability of the dwc:habitat field, the above datasets were selected based on trial and error, because, as in the dataset ‘The UK Archive for Marine Species and Habitats Data’ (https://doi.org/10.15468/qym7wp), for example, only after downloading the dataset does one find that there is no dwc:habitat field, contrary to what the title says, which is that it is findable.

This situation leads to a lack of both findability, because it is impossible to know if there is a non-null dwc:habitat field, and interoperability, due to the large range of different texts describing similar contents in dwc:habitat. It is therefore necessary to encode the dwc:habitat field so that it can be indexed in GBIF, thus increasing the findability and interoperability of the data in GBIF.

Habitat Classifications as a Solution for Standardization

The NATURA2000 Network and its habitat classification

NATURA2000, based on the Habitats Directive (92/43/EEC) and the Birds Directive (2009/147/EC), represents the most extensive coordinated network of protected areas globally, with the primary goal of safeguarding Europe’s most valuable and threatened species and habitats. An interpretation manual for the Annex I habitats was produced in 1993, and was then updated in parallel with new countries joining the EU. Currently, the Habitats Directive selectively lists 233 natural habitat types (in the following NATURA2000 habitats) of Community interest in Annex I, hierarchically classified and identified by specific numerical codes, accompanied by descriptive names in the current version of the interpretation manual (European Commission, 2013).

As an example, Figure 1 shows the position of the 13 habitats included in the ‘9200: Mediterranean deciduous forests’ class, hierarchically nested within the ‘9: Forests’ group. A complete description of each habitat is given in the official interpretation manual (European Commission, 2013), and can be found by browsing the classification at https://eunis.eea.europa.eu/habitats-annex1-browser.jsp.

Figure 1

Hierarchical structure of NATURA2000 habitat classification. Visit https://eunis.eea.europa.eu/habitats-annex1-browser.jsp to browse the classification.

The benefit of adopting NATURA2000 codes for the dwc:habitat term is not limited to standardizing the habitat description; doing so semantically enriches the occurrence data, intrinsically linking it to a conservation context, a legal obligation at the European level, and a monitoring system. This adds a dimension of political relevance and conservation status to the raw data, improving the overall quality of DwC data.

The EUNIS Classification

The EUNIS (European Nature Information System) classification is a comprehensive, pan-European system for habitat identification, covering 331 terrestrial and marine habitat types in Europe and the surrounding seas in a common and easily understandable language that can be used for describing all habitats in Europe, allowing habitat data to be reported comparably for inventories, monitoring, assessment, and biodiversity indicators (Moss, 2008; Davies and Moss, 2020; see also https://eunis.eea.europa.eu/index.jsp). The EUNIS classification is intrinsically hierarchical (Figure 2), structured as a dichotomic key (Figure 3), and organized into levels ranging from broad categories to more specific habitat types. It is a dynamic system, subject to a continuous review process with the goal of improving hierarchical consistency, eliminating ambiguities and overlaps in type definitions, and extending the typology to the entire European continent and its seas (Davies, Moss, and Hill, 2004; Evans, 2012).

Figure 2

Hierarchical structure of EUNIS habitat classification. Visit https://eunis.eea.europa.eu/habitats-annex1-browser.jsp to browse the classification.

Figure 3

The starting point of the EUNIS dichotomic key (from Davies and Moss, 2020).

EUNIS provides ‘crosswalks’ (mappings) between its habitats and other classifications, including NATURA2000 and the European Red List of Habitats, facilitating interoperability between different systems (Moss and Davies, 2002; Moss, 2008; Schaminée et al., 2012). Crosswalks are organized so that the level of equivalency between two classifications is represented by symbols, as in Table 1.

Table 1

Symbols used in crosswalk tables linking EUNIS classification to NATURA2000.

=The revised EUNIS habitat is equal to the NATURA2000 habitat type
#The revised EUNIS habitat overlaps with the NATURA2000 habitat type
<The revised EUNIS habitat is narrower than the NATURA2000 habitat type
>The revised EUNIS habitat is wider than the NATURA2000 habitat type
blankThe revised EUNIS habitat is not linked to a NATURA2000 habitat type

For example, the EUNIS class ‘T17: Fagus forest on non-acid soils’ is broadly inclusive of the NATURA2000 class ‘9210: Apennine beech forests with Taxus and Ilex’ (i.e., T17 > 9210; see Table 2). Thanks to the continuous revision process, EUNIS classification gives the possibility of more detail by applying the class ‘T1764: Sila Fagus forests’ (see Figure 2), which is narrower than the 9210 class (i.e., T1764 < 9210).

Table 2

Example of crosswalk between EUNIS and NAT2000 classifications of Fagus forests (symbols as in Table 1).

EUNIS NAMENAT2000 NAME
T17 Fagus forest on non-acid soils>9130 Asperulo-Fagetum beech forests
>9140 Medio-European subalpine beech woods with Acer and Rumex arifolius
>9150 Medio-European limestone beech forests of the Cephalanthero-Fagion
>9210 Apennine beech forests with Taxus and Ilex
>9220 Apennine beech forests with Abies alba and beech forests with Abies nebrodensis
#9270 Hellenic beech forests with Abies borisii-regis
>9280 Quercus frainetto woods
>91K0 Illyrian Fagus sylvatica forests (Aremonio-Fagion)
>91S0 Western Pontic beech forests
>91V0 Dacian Beech forests (Symphyto-Fagion)
#91W0 Moesian beech forests
>91X0 Dobrogean beech forests

Official crosswalks are available at https://sdi.eea.europa.eu/data/bfe4c237-e378-4a83-ab21-b3807f96c2e2.

The possibility of improving EUNIS on a scientific basis, while maintaining the focus on hierarchical consistency and formal definitions supported by an expert system, makes it an extremely robust and versatile candidate for standardizing the DwC term dwc:habitat.

Advantages of Adopting NATURA2000 or EUNIS for dwc:habitat

One of the main advantages of adopting standardized classifications is the substantial increase in data interoperability. When habitat data are mapped using a common, structured vocabulary, it becomes easier to combine and compare across different datasets, institutions, and countries. This is particularly relevant for Darwin Core, whose primary goal is to facilitate the sharing and integration of heterogeneous biodiversity information (Wieczorek et al., 2012). For example, a research project on carabid distribution along a heavy metal gradient in Poland and England (Skalski et al., 2011) produced two datasets (https://doi.org/10.15468/voq4f7, https://doi.org/10.15468/x3monw), and the term ‘meadows’ was used to record habitats in both regions. These datasets would greatly benefit from narrowing the habitat description to one of the four classes that best describes the sampled sites from the EUNIS classification ‘R22: Low and medium altitude hay meadow’ (overlapping with the NATURA2000 6270 and 6510 classes).

The information introduced in the dwc:habitat field should indicate the classification of origin and be represented with the same format, e.g., ‘NAT2000: 9210 Apennine beech forests with Taxus and Ilex’, or ‘EUNIS: T17 Fagus forest on non-acid soils’ (or, with more detail, ‘EUNIS: T1764 Sila Fagus forests’). The habitat code should be accompanied by information in dwc:eventRemarks, giving the URI pointing to the official vocabulary. A new DwC term could be introduced, dwc:habitatID, for storing URI information (as with the OBO Relation Ontology for dwc:relationshipOfResourceID).

Adopting standards like NATURA2000 or EUNIS, which are already widely used and recognized in Europe, would facilitate the expression of data in formats like RDF, promoting participation in Linked Data and Semantic Web initiatives (Stucky et al., 2014; Baskauf et al., 2016). This would make the data not only interpretable by humans but also machine-processable, unlocking new opportunities for automated analysis and integration with other sources of information (Golbreich et al., 2007; Mungall et al., 2010).

The standardization of the dwc:habitat term is fundamental for advancing both scientific research and the effectiveness of conservation policies. The ability to aggregate and compare data from different regions and time periods with a clear understanding of the associated habitat is crucial for identifying significant trends and patterns (Reichman, Jones, and Schildhauer, 2011; von Wettberg and Khoury, 2022). For conservation policies, using recognized European classifications like NATURA2000 or EUNIS would provide a solid basis for monitoring the conservation status of habitats and species. This would allow policymakers to formulate more targeted and effective management strategies, evaluate the success of conservation measures, and fulfil reporting obligations at the national and European levels. In this sense, integrating these standards into DwC would not only improve data quality but also elevate their relevance and impact for biodiversity protection.

Concluding Remarks

To resolve ambiguity in dwc:habitat and enhance biodiversity data, the scientific community should adopt an ecologically grounded, updatable habitat vocabulary. Given its completeness, detailed hierarchical structure, and scientific basis, the EUNIS classification seems a good candidate to be promoted as the primary vocabulary for dwc:habitat. The use of a controlled vocabulary for mapping data falling under the DwC term dwc:habitat should be a mandatory, or at least strongly recommended, practice within the Darwin Core standard. This transition from a free-text field to a controlled field is fundamental to ensuring consistency and semantic precision, especially to prevent one habitat that is present in different regions from being described with disparate terminology.

Backward compatibility with existing free-text data could be reconstructed by means of, for example, dwc:eventID, which provides a unique identifier to the single object ‘event’ (where the free text was originally introduced). In this context, geographical coordinates can help assign the free-text information to the new habitat code. The free-text details, which controlled codes cannot express, could be shifted to the dwc:eventRemarks field, which is specifically intended for comments or notes about the dwc:Event.

The transition to a controlled field would also be highly beneficial at the international level, as classifications similar to EUNIS exist for other regions. From an international perspective, the International Union for the Conservation of Nature (IUCN) Global Standard for the Identification of Key Biodiversity Areas (KBA Committee, 2020) aims to identify Key Biodiversity Areas (KBAs) through specific criteria, leading to the qualification of a site as a KBA. These criteria are based on a broad habitat classification (see https://www.iucnredlist.org/resources/habitat-classification-scheme), where habitat types have been grouped into wide classes, mainly giving information about vegetation physiognomy. Such habitat classification is useful if scientists from very different regions need to share preliminary information about habitat or species data.

A three-level classification of world formation types was proposed by Faber-Langendoen et al. (2016) in which identification is driven mainly by physiognomy (e.g., 1. Forest & Woodland → 1.B. Temperate & Boreal Forest & Woodland → 1.B.1. Warm Temperate Forest & Woodland). Conversely, the classification proposed by Loidi (2025) offers a more informative approach to the classification of world vegetation, utilizing a nested structure ranging from the broad level of Domains and Ecozones to the finer level of plant associations.

In North America (the US and Canada) the EcoVeg hierarchical approach is applied, classifying plant associations within an eight-level hierarchical structure, organized by ecological units (Faber-Langendoen et al., 2016). Similarly, the Australian habitat classification was developed based on the National Vegetation Information System (NVIS), a hierarchical framework structured into six levels, from Class (Level 1) to sub-association (Level 6). The NVIS initially focuses on structural information (Levels 1–3) before reaching the species-level detail at Levels 5 and 6 (NVIS Technical Working Group, 2017).

Due to its diverse territorial features, the biogeographical mosaic of Asia lacks a single ‘continental’ system like EUNIS. In China a complex vegetation classification was developed based on either vegetation physiognomy or phytosociological principles, creating a hierarchical system in which features of plant communities play a fundamental role for the finer levels of classification (Guo et al., 2020).

Hierarchical structure is a characteristic shared by international habitat classifications, which at the regional level are strongly influenced and characterized by the peculiarity of habitats. Therefore, in order to establish an international vocabulary, it is necessary to construct a dichotomous key based on ecoregions. Within each ecoregion, the habitat classification scheme adopted for that geographical area should be followed, with individual habitat units identified by a unique regional prefix appended to the code, such as EU for the European EUNIS classification, so that the class ‘T1: deciduous broadleaved forest’ becomes ‘EU-T1: deciduous broadleaved forest’.

Despite the technical feasibility of habitat standardization, a gap remains that is worth closing through future research: to prepare an actionable framework for the transition from the current non-FAIR habitat information to a structured one. This means that a coordinated strategy is needed to engage the scientific community, conservation institutions, and data providers in a collaborative process, in order to build a culture of data sharing and standardization. The main steps are, at least, to identify the primary stakeholders, their needs and technical constraints, and to determine if there is resistance to shifting from free text to structured information. Where possible, there should be harmonization of ecoregional crosswalks, to provide a technical bridge at the higher levels of the regional systems (like EcoVeg, NVIS, EUNIS) for global ‘readability’ of the unified habitat information. Authors of scientific outputs should be incentivized to use standardized habitat reporting. The whole transition process should be managed under the umbrella of an international governing body (like TDWG or GBIF), which would oversee future developments of the habitat vocabularies.

Language: English
Page range: 17 - 17
Submitted on: Oct 14, 2025
Accepted on: Apr 21, 2026
Published on: May 8, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Roberto Pizzolotto, Fabiola Durante, Wouter Dekoninck, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.