1. Introduction
Research data management is ‘a set of practices for handling and organising research data to make it easier to find, understand, and use’ (Higman, Bangert and Jones, 2019). RDM is ‘a process that involves various key stakeholders, such as researchers, executive management, and staff’ (Ashiq, Usmani and Naeem, 2022). The implementation of RDM is ‘tailored to the research cycle itself and involves all activities, including planning, managing, processing, organising, analysing, preserving, accessing, reusing, and creating data’ (Higman, Bangert and Jones, 2019; Ashiq, Usmani and Naeem, 2022). RDM aims to facilitate efficient research processes, ensure the accuracy, reliability, and replicability of research data, and ensure the security of research resources (Higman, Bangert and Jones, 2019) to ensure that researchers’ data collection processes are organised, understandable, and transparent (Surkis and Read, 2015) so that research becomes efficient, sustainable, highly qualitative, and provides the maximum impact and reach including publication and accessibility (Finkel et al., 2020); and to save research time (Ashiq, Usmani and Naeem, 2022).
In environmental research, it is essential to do RDM well, given that the data generated can be extensive, have various formats, come from various sources, be used for various purposes, and be utilised by various professions and occupations. Finkel et al. (2020) stated, ‘RDM is essential for long-term environmental research collaborations because of its role in efficiency, continuity, quality, and maximum impact, as well as providing access to valuable long-term data sets’. In addition, Beagrie (2003) mentioned that data serves as a tool in monitoring and enforcing compliance with environmental regulations or standards and policies and assisting in informed decision-making.
The ‘current state’ of RDM research in environmental studies is still unclear, even though RDM is a suitable data management approach to implement in an organisation, particularly a research institution. To get the ‘current state’ of research, reviewing the previous literature on the topic and how it relates to other topics is necessary. A literature review aims to analyse and synthesise the work done in a particular area and identify what we know and do not know about the research question posed (Bradbury-Jones et al., 2022). For this reason, this research will attempt to search, collect, analyse, and reveal the latest data on RDM studies or research related to environmental studies based on previous literature using the scoping review framework proposed by Bradbury-Jones et al. (2022). Alongside the scoping review, this research analysis is supported by statistical tools, specifically bibliometrics, to identify diverse patterns in the literature related to RDM studies within environmental studies. Bibliometrics analysis identifies a set of literature, usually associated with a publication dataset on particular subject areas (Napitupulu and Yakub, 2021). The technique of combining scoping review and bibliometric analysis is called ScoRBA, which was introduced by Wijaya et al. (2023).
From the background and problem formulation regarding the study of RDM in environmental studies, the research questions that will be tried to be answered in this study are:
RQ1. What publication trends lead to the literature on RDM in environmental studies?
RQ2. What are the main themes that emerge on RDM in environmental studies?
2. Methods
2.1. Framework scoping review
This study follows the scoping review framework by Bradbury-Jones et al. (2022), a widely used methodology in social and health sciences, which structures literature analysis using the PAGER framework (Patterns, Advances, Gaps, Evidence for Practice, and Research Recommendations). Scoping reviews are used because they can be conceived as a method of reviewing research evidence for specific reasons: to examine the extent and reach of research activity in a particular field; as a pre-cursor to a full systematic review; to summarise and disseminate research findings and to identify gaps in the evidence base (Arksey & O’Malley, 2005).
The stages in scoping reviews, according to Arksey & O’Malley (2005), consist of six stages. Additionally, an optional component ‘consultation exercise’ can be included to validate findings and enhance the study’s robustness (Arksey and O’Malley, 2005). This study used five stages without adding an optional component (Table 1).
Table 1
Stages of the scoping review framework.
| 1 | Identification of research questions | RQ1. What publication trends lead to the literature on RDM in environmental studies? RQ2. What are the main themes that emerge on RDM in environmental studies? |
| 2 | Identification of relevant studies | A preliminary review of pertinent studies was conducted to identify the keywords and concepts. |
| 3 | Study selection | Performed literature selection on six databases using the steps in PRISMA |
| 4 | Charting the data | Visualising and grouping data to facilitate reporting of findings. |
| 5 | Collating, summarising, and reporting results | Report findings with PAGER |
| 6 | Consult with stakeholders regarding findings (Optional stage) | _ |
2.2. Identification of research question and relevant studies
In order to structure the process of searching, selecting, and determining the literature, this research adapted the guidelines for systematic review and meta-analysis (PRISMA) from Moher et al. (2009). It was updated by Page et al. (2021).
The scoping review began with determining the research question. Two main terms from this research question were taken, which became the key terms for the literature: ‘research data management’ and ‘environment’. The online thesaurus Power Thesaurus (https://www.powerthesaurus.org/) is used to check the synonyms and related terms of these searching terms were included. A set of query terms was obtained that fit the concept with the purpose of this research (Table 2), namely four terms that cover the concept of RDM and 18 terms that fall into the scope of ‘environment’.
Table 2
Searching terms on ‘Research Data Management’ and ‘Environment’ keywords.
| RESEARCH DATA MANAGEMENT | ENVIRONMENT |
|---|---|
| research data management; scientific data management; data stewardship; research data services | environment; environmental science; ecology; ecological; earth science; geoscience; physical science; nature; natural science; conservation; preservation; agricultural; forestry; forest; ecosystem; climate; biology; biodiversity |
The search was conducted in five databases: Scopus, EBSCO, Science Direct, Sage Journals, Emerald, and one journal, Nature. Using more than one database to widen the search area and get more publications on RDM in environmental studies. The reason for adding the Nature journal is that it has a scope in natural science and applied technology, which is widely related to the environmental field (Nature.com).
2.3. Study selection
2.3.1. Search for articles through the database
The relevant literature was searched from the selected databases using a search query based on the chosen terms (Table 2) with Boolean operators ‘AND’ and ‘OR’: (‘research data management’ OR ‘scientific data management’ OR ‘data stewardship’ OR ‘research data services’) AND (environment OR ‘environmental science’ OR ecology OR ecological OR ‘earth science’ OR geoscience OR ‘physical science’ OR nature OR ‘natural science’ OR conservation OR preservation OR agricultural OR forestry OR forest OR ecosystem OR climate OR biology OR biodiversity). However, due to the different search systems in each database, adjustments were made to the search strings used in each database and journal (Appendix 3).
2.3.2. Filtering
The first filtering was to eliminate duplicate articles (Criterion 1/C1) and limit the source type, i.e., only selecting the type of article (C1). Filtering was done using the reference manager software Mendeley. The following filter was the language chosen by selecting articles written only in English (Criterion 2/C2). As with the search string, the source filtering stage based on the source type and language was also adjusted according to the filtering system in each database and journal (Appendix 3).
The following filter eliminates articles that do not have an abstract (Criterion 3/C3). The last filter was to examine the abstract and keywords of the articles to ensure the suitability of the “search term” that had been determined in the identification of research questions stage and relevant studies stage, only select articles that had at least one word from the set of terms “RDM” and the term of “environment” (Criterion 4/C4).
2.4. Charting the data
Based on the study selection using a search strategy adapted from the PRISMA stage on November 22nd, 2023, against five databases and one journal, there are 248 articles on RDM in the environmental studies (Figure 1). The data will be charted or used in other forms of visualisation with the help of Bibliometrix through the R Studio program and VOSviewer 1.6.19 to answer the research questions. The article’s metadata set in RIS format was input into VOSviewer. Before inputting data into VOSviewer and Bibliometrix, data cleaning was carried out on keywords as follows: (1) indicates keywords that have the same meaning; (2) sort all the keywords that have been marked from A to Z; (3) uniformise the keywords that will be used, and (4) re-insert the uniformed keywords into excel format. This stage needed to improve the quality of keywords, as stated by Bjarkefur et al., 2020 that a structured workflow for preparing newly acquired data for analysis is essential for efficient, transparent, and reproducible data work.

Figure 1
Adapted PRISMA flow diagram of search strategy.
2.5. Collating, summarising, and reporting results
In the last stage of the scoping review, the results were reported using the PAGER approach (Arksey & O’Malley, 2005) to describe emerging RDM research themes in environmental studies. Before reporting with PAGER, bibliometric analysis was conducted to complete the review of the scope of RDM issues in environmental studies using VOSviewer to build and visualise co-occurrence networks. The keyword co-occurrence analysis is a quantitative method that examines co-occurrence patterns in specific documents, enabling researchers to identify patterns, correlations, and concealed themes in the literature that may not be apparent through usual narrative analysis (Wijaya et al., 2023). Because it combines Scoping Review and Bibliometrics analysis, this method is called ScoRBA (Wijaya et al., 2023). The ScoRBA approach aims to systematically outline themes related to specific themes, contributing to a deeper understanding of the research landscape and guiding future investigations in the field (Zakaria et al., 2024).
The clusters on co-occurrence networks will be obtained after visualisation and then become the Pattern. Next, related papers in each cluster will be reviewed to find the Advances, Gaps, Evidence Practice, and Research recommendation (PAGER). The related papers were obtained by using the corresponding keywords taken from overlay visualisation on average publication year (Avg. pub. year) and average normalised citation number (Avg. norm. cit) in VOSviewer (Wijaya, 2024). The Avg. pub. year was used to find the most recent papers, and the Avg. norm. cit for the most influential or cited papers. However, in this research, the author can only use Avg. pub. year (Appendix 1), while the citation data was obtained from 5 databases and one cleaned journal (Appendix 2). This data is used because VOSviewer cannot display citation data if the raw data in RIS format is derived from various databases; as stated by Lim, Kumar, and Donthu (2024), ‘to date, combining bibliometric data from multiple databases for VOSviewer remains elusive’.
In addition to PAGER reporting, a thematic map was created with Bibliometrix. The thematic map categorises topics based on their development degree (density) and relevance (centrality) (D Kumar, 2024). The y-axis denotes the density, and the x-axis denotes the centrality (Nasir et al., 2020; Adharsh et al., 2023). The centrality measures the importance of the selected theme, and density measures the development of the chosen theme (Nasir et al., 2020). The thematic map is divided into four quadrants (Nasir et al., 2020). The bottom right quadrant (High Centrality, Low Density) is the basic theme of the study. The driving and influencing themes are in the upper right quadrant (High Centrality, High Density). A specific theme is the upper left quadrant (Low Centrality, High Density). The bottom left quadrant (Low Centrality, Low Density) is the emerging or declining theme (Aria et al., 2022; D. Kumar, 2024). In a thematic map, keywords will appear in certain clusters, and only the top three keywords in each cluster will appear.
3. Results
3.1. The growth of RDM literature in environmental studies
Articles on RDM in environmental studies were found in 1985, but after that, nothing was published until 1999. From 2000 to 2010, RDM articles were published almost every year, although the number fluctuated. From 2011 to the following years, there was a significant increase. Article production peaked in 2020 and 2021 (n = 31), then decreased slightly in the following two years (Figure 2). However, the number of productions was still relatively high compared to the previous period. Apart from these fluctuations, the importance of RDM as a topic has generally increased since 2012.

Figure 2
Annual scientific production of RDM in environmental studies (1985–2023).
3.2. Main themes and trends of research data management in environmental studies
3.2.1. Keyword Analysis
The themes of RDM in environmental studies can be seen from the keywords in the literature. The most common theme is RDM itself, with 114 occurrences. The second most common occurrence is data management, with 90 occurrences. Other keyword occurrences, such as information management, research data, metadata, data sharing, and information processing, can be seen in Figure 3.

Figure 3
Keyword list of RDM themes in the environmental studies.
3.2.2. Thematic Maps
A more precise visualisation can use thematic maps, which are strategic diagrams used to find the main emerging themes (trends) and future research topics of the RDM in environmental studies. This study has thirteen theme clusters in different quadrants (Figure 4). The bottom right quadrant, the study’s basic theme, is cluster one, which contains the life cycle, data life cycle, and decision making, and cluster six, which contains the themes of research data, data sharing, and data curation. There is also cluster three, placed between basic and influencing themes, which contains research data management, data management, and information management. The upper right quadrant, namely the driving and influencing themes in RDM topics in environmental studies, is cluster five, containing information processing, human, and software. Next is cluster four: data acquisition, biodiversity, and climate change. Cluster seven has the theme of data handling, quality control, and climatology. The last one is cluster nine, which is placed between influencing and specific themes containing the themes of artificial intelligence, digital transformation, and open scholarship.

Figure 4
Thematic map of RDM themes in environmental studies.
The upper left quadrant is a specific theme; the higher the position, the more frequently the themes appear or have many publications, namely cluster 12, which contains archaeological data, agricultural data, and biomedical data, and cluster two, which contains biodiversity informatics, data assimilation, and data protection. Finally, the bottom left quadrant is the emerging or declining themes: Cluster 11, academic libraries, South Africa, bibliometrics, and Cluster 10, agricultural productivity and technology. Next is cluster eight, which contains research data sharing, Tanzania, and agricultural research. The last is cluster 13, containing the conceptual model (Figure 4). Based on the keyword search in the lower left quadrant conducted on Scopus on May 29th, 2024, it is known that the keyword ‘conceptual model’ is an emerging theme because the last publications in 2024 and 2023, although not yet cited and the number is only six publications.
3.2.3. The PAGER reporting
As the final stage of the scoping review, the following reports the search results for articles on emerging themes in RDM topics in environmental studies. As stated in the Methods chapter, the reporting uses the PAGER method of Arksey & O’Malley (2005).
This study generated 108 out of 1460 keywords from 248 selected articles. Selected keywords appeared four times simultaneously (co-occurrence), hoping the relationship between the keywords that appeared was strong. As a result, VOSviewer displays four clusters (Figure 5), with each cluster having a different focus of study (Table 4).

Figure 5
Co-occurrence of keywords: 108 out of 1460 keywords met the threshold of a minimum of four occurrences.
From the cluster in Table 3 and Figure 5, PAGER reporting was conducted (Table 4) along with literature sources derived from the corresponding keywords mentioned in the methods section (Appendix 4–7).
Table 3
Distribution of research themes and keywords across RDM clusters in Environmental Studies.
| CLUSTER | THEME | DESCRIPTION |
|---|---|---|
| Red | Data management | The red cluster consists of 41 keywords, focusing on “Data Management” on environmental studies by looking at the 10 highest co-occurrences of data management, information management, metadata, digital storage, ecology, open data, fair data, information system, and research. In this cluster, the relationship between RDM and the environmental fields is most visible compared to the others. The relationship can be seen in ecology, biodiversity, climate change, ecosystems, plants, climatology, earth science, environmental management, and environmental science (Appendix 1). |
| Green | Information processing and biomedicine | The green cluster encompasses 26 keywords discussing RDM in information processing that focus on biology and medicine research, as can be seen from the keywords information processing, human, software, medical research, big data, bioinformatics, biomedical research, biology, biomedicine, computational biology, and human experiments (Appendix 1). |
| Blue | RDM practices | The blue cluster consists of 25 keywords discussing various RDM activities that can be seen in the 10 highest co-occurrence keywords: data sharing, data curation, data repository, data preservation, digital preservation, fair principles, research management, big data, data management plan, institutional repositories, and data reuse (Appendix 1). |
| Yellow | RDM and supporting system | The yellow cluster amounted to 16 keywords, which indicated the supporting system of RDM that involves librarians in academic libraries supporting research through RDM services. The keywords supporting the statement are research data management, research data, academic libraries, librarian, open access, universities, collaboration, bibliometrics, library services, and research support (Appendix 1). |
Table 4
PAGER analysis results for RDM in environmental studies.
4. Discussion
The topic of RDM in environmental studies is divided into four clusters: the red cluster about data management, the green cluster about environment-related research fields, information processing in biomedicine, the blue cluster about RDM practices, and the yellow cluster about RDM with any supporting systems. Based on Tables 3, 4, and Appendix 4, the themes discussed in all clusters are, on average, almost the same so that this discussion will highlight PAGER as a whole regardless of the cluster.
The advancements widely discussed on RDM in environmental studies are RDM practices, Data sharing and open science, Collaboration and integration, FAIR principles and data reusability, Technological innovations and tools, and Standardisation and interoperability. These themes were labelled because the literature found in this study showed the number of scientific communities in the environmental field trying to develop various data repositories, websites, and databases by adopting and utilising technological innovations, collaboration and integration among researchers and funding agencies, FAIR principles compliance to implement and improve RDM practices. Noted repository-database-tools-projects that are the object of study are: PANGAEA, The Ocean Carbon and Acidification Data System, PLANTdataHUB, PalMod II, Tephra Data, The 2021 NFDI4BIOIMAGE, Adacta, Earth System Model, SynBio2Easy-on SBOL, The German Network for Bioinformatics Infrastructure (de.NBI), Kadi4Mat, SMM-CD and SMM-CD_NRP (Dunn et al., 2021), The Ira Moana Project, BEXIS2, Menoci, The Long Term Ecological Research (LTER), The ISCCP B1, PGP Repository, E!DAL, and GOBLET.
Furthermore, because they allow for increased transparency and data accessibility, open science and data sharing, methods have emerged as a crucial basis for improving ecological research and tackling global environmental concerns (Hampton et al., 2015). Cooperation among academics, institutions, and funding agencies is also essential for developing RDM infrastructure and innovation, including using cloud-based platforms and sophisticated data management tools (Persaud et al., 2021). However, in order to guarantee data quality and long-term sustainability, data standardisation and interoperability initiatives are essential (Thomas & Martin, 2020). Therefore, the basis for more comprehensive and long-lasting development of environmental research has been established by combining various data sources, embracing cutting-edge technology, and committing to sound data governance.
The advances of RDM in the environmental domain are stated based on the disclosure of evidence that has been done (evidence for practice) by various environmental organisations. In line with the advances, evidence for practice across all clusters also shows collaboration and community engagement, RDM practices and integration, data sharing policies and practices, and the beginning of using various tools and infrastructure for RDM. In order to create the infrastructure, tools, and common policies that facilitate data integration and sharing, cooperation and community involvement have been essential (Persaud et al., 2021). These organisations have also implemented strict RDM procedures, such as long-term preservation and data management strategies with metadata standards (Michener, 2015). Furthermore, open science methods and data sharing regulations have been widely embraced to promote cooperation and openness (Hampton et al., 2015). In addition, the support of advanced tools and infrastructure, such as smart energy and water management and IoT-enabled smart environment (Curry et al., 2019), Integration of genetic data (Liggins et al., 2022), development of a feature-constraint mesh generation algorithm within a GIS (Avdis et al., 2018). Thus, this evidence suggests that advances in RDM in the environmental field are driven by collaboration, standardised practices, data sharing policies and infrastructure innovations.
Despite the evidence for practices and advances that have been made, there are still many gaps in RDM in environmental studies. Some of the most discussed gaps are Training and capacity building in RDM, Challenges in data sharing and FAIR compliance, Interoperability and integration issues, Adoption and implementation of advanced tools and practices, and Data sharing and reusability. In environmental studies, gaps in RDM remain a significant challenge. Lack of training and capacity building in RDM is one of the primary problems, as many researchers cannot efficiently manage, distribute, and protect data (Tenopir et al., 2014). Data accessibility and metadata quality (Roche et al., 2022), inconsistent metadata, and inadequate infrastructure (Wilkinson et al., 2016) are the main reasons why data access and reuse are still hampered by issues with data sharing and FAIR principle compliance, even with the growing adoption of these principles. Interoperability is also essential because, in the absence of standardised protocols and tools, it can be challenging to integrate data from various sources and formats (Michener, 2015).
Furthermore, because there is a dearth of standardised data, interoperability and integration problems still impede environmental research. In addition, interoperability and integration issues continue to hamper environmental research due to the lack of standardised data formats and uniform metadata practices (Astell et al., 2018). On the other hand, adopting advanced tools and practices such as machine learning and cloud computing remains limited, often due to a lack of expertise, funding and infrastructure (Hampton et al., 2015).
From the gaps that occur in Table 4, future research in the environmental field needs to focus on several critical areas to address gaps in RDM are Interoperability and integration, Data sharing and collaboration, Training and capacity building, and implementation of FAIR principles and open science, Standardisation and best practices, Collaboration and community engagement, Validation and performance assessment. Apart from PAGER, future research themes recommended for future research can also be seen from thematic maps (3.2.2), namely in the lower right quadrant (basic theme). In this quadrant, some themes that still have opportunities to conduct various RDM research in environmental studies are those related to research data, data sharing, data curation, data life cycle, information management, and RDM itself. These themes are close to centrality, which means they have a high level of relevance but low density, which means they have not been explored much.
5. Conclusion
This research identifies trends and developments related to environmental studies in research data management (RDM). Through scoping review and bibliometric analysis of 248 papers from various databases, it was found that interest in RDM in environmental studies began to increase significantly since 2012. The most dominant keywords in this study are RDM, data management, and information management, which reflect the focus of the researchers. The main themes that are widely discussed include the application of FAIR principles, open data management, infrastructure development, and innovations in data sharing and data stewardship. On the other hand, the research also revealed areas that still need further exploration, such as the research data, data sharing, data curation, data life cycle, information management, and data management.
As such, this research provides an in-depth understanding of the development of RDM in environmental studies and highlights gaps and future research opportunities. The findings are expected to guide researchers and practitioners in developing strategies and innovations in environmental research data management.
The study, of course, has many limitations. First, this research used five databases and one journal with different literature search features, resulting in a non-uniform search strategy applied to each database and journal. This non-uniform search strategy suggests that all documents in each database and journal could not be entirely retrieved. The following limitation is that not all metadata in the five databases and one journal has the same level of accuracy, which caused limitations in mapping or visualising the bibliometric network, inconsistencies, and errors during analysis, including keywords. However, we tried to clean them up before putting them into the analysis tool. The final limitation is that the content and context of research that may be important for scoping reviews that seek to pinpoint knowledge gaps and guide future research paths can only be revealed partially through bibliometric analysis. The PAGER methodology came to ensure a more thorough scoping review of RDM research in the environmental field.
Data Accessibility Statement
Data supporting the findings of this study, including a list of publications analysed is available through the Figshare Repository at https://doi.org/10.6084/m9.figshare.28765952.
Additional File
The additional file for this article can be found as follows:
Acknowledgements
The authors are grateful to Fathia Febrianti, who assisted in gathering the literature to analyse.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
Conceptualisation, Rosini; data curation, Ida Fajar Priyanto; formal analysis & review, Mudiyati Rahmatunnisa and Sunardi; writing & editing draft, Rosini; all authors have read and agreed to the published version of the manuscript.
