Table 1
Challenges identified from literature and categorised into findability, accessibility, interoperability, reusability and research data management and infrastructures.a
| REFERENCE | AIM | FINDABILITY | ACCESSIBILITY | INTEROPERABILITY | REUSABILITY | RESEARCH DATA MANAGEMENT AND INFRASTRUCTURES |
|---|---|---|---|---|---|---|
| Adriaens et al, 2015 | Apps for recording invasive species are presented and issues of data interoperability, openness and harmonisation discussed. Recommendations are provided. | Compilation of invasive species data from different regions is a growing challenge. Recommendation: “Ensure that applications generate data in a standardized format and feed into central record collection systems.” Such a system could be GBIF. Also, developing a possibility of creating alerts about new datasets to rapid-response stakeholders is encouraged. | Data sharing is important for managing biological invasion strategies. If shared pictures do not have a license, then linking to accompanying data is hampered. Recommendation: “Inform users about issues of intellectual property rights of records and associated media files so that this does not restrict further usage.” | Apps may have overlapping functions (recording same species) which may cause confusion and competition. Long-term DM and technical updates need secure funding, also if data are used for policymaking and regulation. Recommendation: “Ensure sustainable funding or think of alternative solutions for technical updates and data verification.” | ||
| August et al, 2015 | Describes how new technologies are changing the study of the biological world. | UUIDs such as DOIs will secure findability. A standard for tracing editions of a dataset should be developed. | Data curation to secure access is necessary. Moving from data sharing to data publication with possibility to get cited may motivate to make data open access. | Interoperability allows integration with other datasets and to future-proof the data against technological changes. Use of UUIDs for taxon names. | UUIDs ensure citeability and crediting. | Data warehouses hosting a range of different projects are a solution to secure F and A and to avoid data duplication. |
| Bastin, Schade and Schill, 2017 | The chapter discusses how VGI data in CS and crowd-sourcing projects may be of value for individuals, institutions and decision-makers. With base in the FAIR principles, VGI and generic DM principles are discussed. | Metadata for VGI are very heterogeneous, but standards do exist that can support VGI dataset to become of good quality and becoming machine-readable. Community-used terminologies require semantic mapping before they can be used across domains. VGI data can only be fully appreciated if followed by a use license. The authors describe the applicability of the FAIR principles to VGI data management. The example of GBIF is used to illustrate that cross-domain strategic thinking sustains data curation and discovery, the use of PIDs for datasets and citing, standards and taxonomies for metadata and data provenance documentation etc. | Active RDM of VGI data may ensure the reproducibility necessary for data to be used for scientific and decision-making purposes. Tools to document e.g. how data are packaged and what information describes accuracy is currently lacking. Funding for RDM is often not considered or present in CS projects. | |||
| Borda, Gray and Fu, 2020 | A scoping review on RDM practices in biomedical CS projects and an analysis of selected platforms. | Information on long-term curation and findability is not addressed publicly in scrutinized platforms. | Some, but not all, platforms state how participant data are stored and secured: e.g. genetic information is stored separately from personal and health information. Some platforms may provide third parties with aggregated data. | Ensuring data quality by using standards is not addressed in scrutinized platforms. | Data processes and use of standards across data life cycle are not transparent or openly available for evaluation rendering reuse opaque, conflicted or untraceable. | |
| Chimbari, 2017 | Lessons learned from implementing ecohealth CS projects in South Africa | Copies of data (non-electronic and electronic) should always be transferred to PI. PI manages access rights. Community feed-back on findings is important to sustain trust and engagement. Recommendation: A strategy including musicians or artists is recommended. | General recommendation: Develop clear RDM policies. Tension on authorship often occurs. | |||
| Clements et al, 2017 | Conclusions from a workshop on low-cost air monitoring sensors. | Deployment of a variety of sensors has not been followed by standardisation of data formats, units or metadata. Currently, data transformation is necessary for integration. Data and metadata (e.g. time and date) format standardisation is recommended. | There are huge prospects for saving resources and creating new knowledge by creating a large-scale data management system. Currently, data are not openly shared e.g. for communities to compare. The Air Sensor Workgroup works to make air sensor data FAIR: create metadata standards, software and tools in open source, and develop a data platform. | |||
| Crall et al, 2010 | A survey of CS projects working with invasive species observation is performed and obstacles for getting the most out of data are discussed. | Access to data is hampered by concern over privacy or sensitive data (personal data, private property, and threatened, endangered species). In general, data sharing before scientific publication is wanted by survey participants. | Projects lack database resources and skills to share data. Recommendation: Standardised data collection, quality assurance protocols and a national data infrastructure could improve invasive species distribution maps and detection. Solutions: The initiative “Global Invasive Species Information Network” aims to link online data sources. Citsci.org can accommodate invasive species CS projects’ data, privacy concerns and data sharing. | |||
| Groom, Weatherdon and Geijzendorffer, 2017 | The paper examines openness assessed from data licensing of GBIF datasets. The relative openness of citizen science data is evaluated. | CS data access is most often determined by CS organisations or PIs. Data for GBIF are often obfuscated to accommodate privacy concerns. Data sharing may depend on funding or authorship possibilities for academic researchers. | Of 1264 CS datasets only 33 has a data license. In general, usage license was more restrictive than non-CS datasets. Datasets without a license can’t be used openly. Recommendation: Organisations must implement clear licensing policies. Projects could make the volunteers choose license for their own data. | 10% of dataset are from CS but constitutes 60% of all observations in GBIF. Citizen scientists may wish for recognition from community. Recommendation: Recognition of contribution from citizen scientist should be supported by data users. Recommendations: Organisations must implement clear RDM policies. Funders should recognise that quality data requires sustainability. | ||
| Higgins et al, 2016 | The EU funded project, COBWEB, has researched the requirements for developing a platform for sharing environmental data from CS projects. Different solutions have been developed or suggested to accommodate the largest challenge for CS data; to make data interoperable and fit for re-use. | The level of public access/data security is regulated i.e. to protect endangered species. User privacy is also addressed. | Existing open standards for metadata and data should be implemented. Ontologies for individual projects should match existing ontologies. | Open source tools are developed to facilitate data collection for non-experts. The platform should offer the structure to facilitate CS data collection and improve environmental monitoring. | ||
| Hunter and Hsu, 2015 | RDA’s Dynamic Data Citation Working Group suggests an approach to dynamic data citation. The authors of this paper developed a testbed that can be used for citing sub-sets of dynamic CS datasets and also recognises the volunteers who contributed the data. | CS datasets containing observations of the environment are often dynamic. Recommendation: – The underlying database must be versioned and support time stamping of changes or additions. – The PID to the citable data comprises a query to the dataset and a timestamp. | Volunteers are rarely cited for their contribution. Recommendation: The Dynamic Citation Approach should allow contributors to the specific dataset to be recognised. | |||
| Kissling et al, 2018 | A WG addresses the need for creating a set of Essential Biodiversity Variables, when collecting biodiversity data, not only in CS projects. To enable CS data to contribute to scientific species monitoring, CS projects also needs data and workflow harmonisation. The applicability of the FAIR principles is underscored. | Datasets must be findable and citable. | Data access restrictions may severely hamper quality control, data aggregation and reuse. | CS data needs rich metadata to assure quality and reuse. Data must be machine-readable. | Documentation and licensing information must accompany published data. Legal interoperability is required for automated workflows and is necessary for data aggregation which is widely used in biodiversity monitoring. However, different licenses for different datasets may restrict use of aggregated datasets, therefore, CC0 and CC BY are endorsed. | |
| Owen and Parker, 2018 | The authors describe how CS data can be used for EPAs and other policy-making bodies. | Metadata are necessary for data quality and for use by EPAs. | EPAs can use CS data of certain quality. | Authors encourage EPAs to offer infrastructure to CS projects. | ||
| De Pourcq and Ceccaroni, 2018 | The blogpost describes the advantages of and organisations behind creating a data and metadata standard for CS projects. | Incompatible data handling hampers data reuse. Reuse of project structures and methods overall is also unlikely if not transparent and following minimum standards. The International Data and Metadata Working Group and the CS COST Action will launch a standard on key elements and concepts of CS projects. Guidelines for its implementation will be provided. | Authors encourage good RDM practices in CS projects to facilitate better data quality. | |||
| Pulsifer, Huntington and Pecl, 2014 | This editorial introduces a special issue of Polar Geography on the challenges and prospects for better inclusion of local and indigenous observations in environmental knowledge. | Observations may contain sensitive information about a people or region that they may not want to share openly. | Access to RDM systems in remote communities may be difficult. But they can link observations from different stakeholders. RDM is not only a question of technical and methodological aspects, but must encompass local culture and economy. | |||
| Runnel and Wijers, 2019 | The WG report addresses issues about managing natural history collections data used in CS projects. | CS portals rarely allow searching for collection-based projects. Metadata standards should facilitate this. | Metadata standards should be adapted to contain information describing natural history collection data. CS project metadata should reveal if data originated from a collection. This will aid transparency for policy makers and recognition of participants. | |||
| Schade, Tsinaraki and Roglia, 2017 Schade and Tsinaraki, 2016 | Survey report and related publication on data management in CS projects. | Observation: Interest to share data is large, but several projects do not provide immediate access. Many projects cannot guarantee sustained or any access to data. Funding for this may be insufficient. | Observation: Data and metadata standards are not applied in many projects. Funding for managing this may be insufficient. | Observation: Licensing is often determined only late in projects and may cause confusion. | Identified DM needs: Promotion of Open Data, Open Science, data preservation, existing infrastructures, development of standards through guidelines and best practices in relevant communities. | |
| Sheppard, Wiggins and Terveen, 2014 | Proposes a model for data provenance/workflow in field sampling and processing. | To make data reusable, documentation and metadata are necessary to track changes to data (provenance), e.g. cleaning, re-entry, new/changed protocol for task definition/sampling. | ||||
| Simonis, 2018 | Proposes a standard model for describing CS data, so they become interoperable and reusable. | The model builds on existing standards. Model is based on resolvable URLs for semantics/identifier to make raw data meaningful for all and machine-readable. | ||||
| Williams et al, 2018 – Refer to Table 2 for more data from this reference. | The chapter addresses which factors should be considered to maximize the use and impact of CS data. | Data accessibility should be considered early in project. | Few CS projects adopt standards for web services or data encodings, because the benefits of sharing data is unclear or because resources to do it are lacking. Interoperability is not only important for machine-interaction, but also for human-machine and community interactions. Specific metadata standards can be useful for different organisation, e.g. DCAT for open governmental data. Semantic interoperability represents the highest level of interoperability for data exchange, quality and sharing. | Preparing CS data for reuse secures the long-term value, therefore consider - which contributions are subject to IPR - data ownership - data use license Contextualising data with metadata, including descriptions of their purpose and methods of creation, allows users to evaluate the reuse and possibility to integrate with other datasets. Data provenance/processing can be difficult to document and therefore understand for other users. | ||
[i] a Abbreviations: CS, citizen science; DCAT, Data Catalogue Vocabulary; DMP, data management plan; DOI, digital object identifier; EPA, environmental protection agency; GBIF, Global Biodiversity Information Facility, PID, persistent identifier; PI, principal investigator; RDA, Research Data Alliance; RDM, research data management; UUID, universally unique identifier; VGI, Volunteered Geographic Information; WG, working group.
Table 2
Ethical and legal challenges identified in literature.a
| REFERENCE | AIM | CONTENT SUMMARY |
|---|---|---|
| Anhalt-Depies et al, 2019 | A framework is conceptualised in which tension in CS is discussed. Privacy policies of 20 projects are reviewed and recommendations offered. | CS data may contain private or sensitive information, e.g. landownership, personal information or pictures of persons, location of endangered species. Privacy-related policies were very different in content and not always project-specific. Recommendations: – During project development, identify potential tensions between data quality, privacy protection, resource security, transparency, and trust in consultation with stakeholders. – Develop a privacy policy or volunteer agreement that addresses these tensions and is consistent with existing guidelines – Develop a data sharing policy that clearly states any restriction on data sharing; consider impacts on resource security and volunteer privacy in determining restrictions, and plan for what to do if a difficult scenario should arise (i.e. detection of illegal activity) – Practice iterative evaluation of policies and practices in use to assess their impact on the ability to achieve program goals – Develop a process for soliciting regular feedback from participants |
| Bowser et al, 2014 | Through examples, the article addresses legal and policy considerations that protect participant privacy in CS. US law and policy is primary offset for article. | Five recommendations are provided: – Determine which data points you can and cannot compromise on in terms of precision, public visibility, and data sharing; clearly state these decisions, and implement the supporting technologies (fuzzing locations, anonymizing identities, etc.). – Give ample notice of privacy choices. Explain the circumstances under which normal participation could be a risk to personal privacy. Inform volunteers who will review their data for quality control. – Give volunteers the option to hide certain data points and locations from public view, or have data publicly visible but attributed anonymously. – Allow volunteers to delete and modify their data—both traditional personal information and submitted data that may contain information “about” the volunteer. – Require only minimum personal data about volunteers. Demonstrate the value of the data you collect, and explain who will be able to see it. Multilevel access control that considers different stakeholders’ roles and needs may be appropriate. |
| Bowser et al, 2017 | A qualitative study of the privacy concerns of CS study managers and volunteers. It is suggested how to design data and information flow and design supporting technologies in CS projects. | Participants evaluate privacy risk in the context of the project. They focus on openness and sharing for personal and collective benefits. Current research regulations may not sustain the culture in CS projects, where concern for privacy is sometimes outweighed by incentives for data sharing. Recommendations: – Minimise personal data collection to sustain trust of volunteers. – Support privacy through design: build-in notifications, filter data upon submission. – Teach volunteers about the data flow. |
| Ganzevoort et al, 2017 | A questionnaire survey of CS biodiversity volunteers’ motivation for collecting data and their views on data sharing and ownership. | Half the respondents view data as a public good, but only few support unconditional sharing. Data should be used for nature protection and with great respect. 69% would like insight to the use of their data. Ca. 40% would like to be cited by name when their data were used. |
| Guerrini et al, 2018 | The article discusses issues around intellectual property rights, research integrity and participant protection in CS projects. These issues are not always or not clearly regulated by laws or institutional policies. | Intellectual property: Volunteers retain the IPR to any copyrightable work they produce. Recommendation: Use CC licenses and make copyright agreements in the projects. Patent assignment as known from employer-employee discoveries rarely occurs in CS. Thus, CS inventors can exclude projects in using the CS invention. Disagreement on license or patent may occur. An obstacle is that CS organisations often don’t have funding to negotiate IPR control. One-way material transfer agreements could be adapted to promote CS sharing, but may be complex to handle. Transparency and clear IPR terms is recommended in CS collaborations. Recommendation: Contracts with volunteers can be made that render project leaders the patent rights or that share the patent right between project leader and CS inventor(s). Research integrity: May be challenged in CS projects if e.g. purpose is biased towards promoting or preventing a community intervention. US federal sponsored CS data must be made openly available to increase transparency. Such laws are not widespread in other countries. Research integrity often relies on peer-reviewing when publishing articles. CS volunteers cannot disclose conflict of interests. Recommendation: Making protocols and data openly available promotes research integrity. Giving volunteers the possibility to stay anonymous is more important than their disclosure of conflicts of interest. Participant protection: Volunteers are not protected by laws normally regulating research subjects. Projects may not be reviewed by institutional boards if founded outside academia. Participant risks may not be disclosed in terms of participation. Recommendations: Community advisory committees may review studies. If funding is available for projects outside academia, IRB evaluation could be obtained. Further efforts are necessary to evaluate if laws can be extended to CS or if specific policies should be created together with citizen scientists. |
| Oberle et al, 2019 | From the example of a Canadian CS project, ethical review of CS projects is discussed | The responsibilities of the IRB review is to protect subject from harm, but generally citizen scientists are “research assistants” rather than “research subjects” and do not fall under IRB reviews. It is suggested that CS projects are reviewed by the legal or public relations department rather than the IRB. However, an initial evaluation of harm from an ethical perspective before deciding for an IRB review could also be a solution. |
| Patrick-Lake and Goldsack, 2019 Wiggins and Wilbanks, 2019 | A connected editorial and article. The complexity of issues that CS projects in health and biomedical need to consider are discussed and concerns exposed. | The definition of what CS encompasses is often blurred. The current technology facilitates new possibilities of data collection, which is “CS-like”. Thus, in several projects, participants act more as research subjects than active citizen scientists. Concerns about participant ethics and protection is valid, because the risks to participants delivering health data is not necessarily addressed. Projects focussing on intervention rather than observation may raise more ethical issues and pose larger risks for participants. CS projects originating from outside academic institutions do not always follow academic regulations and policies. Informed consent can be obscured for participants engaging in data collection that is CS-like. Non-researchers may initiate research where data are delivered to third-parties. Direct publication of non-academic CS data without peer-review and quality control can lead to misinformation. Current ethical frameworks are aimed at handling evaluating risks and protecting participants, and not fit for helping autonomous and engaged co-researchers (citizens). |
| Resnik, Elliot and Miller, 2015 | The authors discusses the ethical challenges occurring in CS as a collaboration between laypeople and scientists. | Research integrity: Research integrity could be compromised in CS projects, where data collectors or project initiators are aiming to address a community-issue of particular concern. Projects may also be funded by organisations or corporate funds with e.g. lobbying, legal or political interests. Both financial and non-financial conflicts of interest should be addressed in the project, both in the beginning and when publishing data and results. Disclosure of conflict of interest could be performed individually or as a group. Access: Data sharing will allow others to evaluate data independently. Potential policies for CS projects on conflicts of interest should, however, not prevent communities for engaging in research that may help them fight e.g. environmental injustice. Data sharing allows others to reuse, discuss and give feedback. Data must be de-identified if containing information on human research subjects. Citizens should be clearly informed of the expected sharing of data (who, when, why). Data ownership and IPR issues may arise if communities expect to have some control over the gathered data. Agreements should be clear and updated regularly with the volunteers. Sharing of culturally-embedded knowledge should be handled with respect. Exploitation of volunteers could occur if the volunteers do not receive a share of benefits potentially obtained by the research they participated in. The scientist should aim at sharing IPR, authorship, formal recognition, education or monetary value. Safety of volunteers should be considered. Co-authorship should be considered for volunteers providing substantial contributions to the study, but may often fall outside the recommendations of ICMJE. The authors encourage credit in the acknowledgment section and sharing of results. The concept of CS may be used misleadingly, e.g. volunteers may serve more as data collectors or research subjects than active participants. |
| Riesch and Potter, 2014 | Qualitative study of CS researchers on methodological, episthemiological and ethical issues. | There is consensus that a CS project should at least be transparent with the data it collects, what it is being used for, and how to keep citizens updated on the process. The question on how citizens should be credited is raised. Data are produced by the public, so ownership is a question to consider. |
| Rothstein, Wilbanks and Brothers, 2015 | The article discusses how newly emerging, technology-enabled, unregulated CS health research poses a substantial challenge for traditional research ethics. In the US, CS projects set up by private persons are not regulated as is company- and academic-driven research. | A: There are no data sharing or publication obligations for private CS projects. R: Without review, the validity of data and results may not be scrutinized or assessed. Projects may not have institutional review, and ethical approval, which can oversee recruitment procedures, participant eligibility and informed consent. Requirements for protection of privacy and confidentiality remain unclear. How can child participants be monitored by legal guardians? Should incidental findings be disclosed and how? |
| Tauginienė, 2019 | The article aims to address ethical aspects of CS projects with focus on research integrity. | No consensus on CS authorship or attributions exists. To increase transparency, informed consent should address the relationship between scientist and citizen and the citizen’s role in the research. The scientist must act socially responsibly by informing society of methods, tools, data and knowledge. |
| Ward-Fear et al, 2020 | The article discusses if and how citizen scientists should be included as co-authors. | Current scientific authorship criteria excludes citizens to be attributed co-authorship. The authors propose implementation of group co-authorship to cohorts of non-professional scientists. |
| Williams et al, 2018 – Refer to Table 1 for more data from this reference. | The chapter addresses which factors should be considered to maximize the use and impact of CS data. | Primary IPR considerations for CS: (1) “background IPR” – How will knowledge and data be used and under what restrictions; and (2) “foreground IPR” –how will the project allow access to the knowledge and data. Personal privacy must be protected, i.e. personal information and location details. Protection of security for objects collected must be considered, e.g. endangered species or unintentional photo capture of persons or secondary objects. Handling of IPR and privacy should be described in Terms of participation. |
[i] a Abbreviations: CS, citizen science; CC,creative commons; IPR, intellectual property rights; IRB, institutional review board; ICMJE, the International Committee of Medical Journal Editors.
Table 3
Identified tools, roadmaps and guidelines for research data management of citizen science.a
| REFERENCE | AIM | RDM CONTENT IN REFERENCE |
|---|---|---|
| Bonn et al, 2016 | A Green Paper presenting the understanding, requirements and potential of CS in Germany and is a roadmap towards 2020. Guiding principles are also presented. Two chapters discuss data management of and the legal and ethical framework for CS. The recommendations for action are listed here: | General RDM:
Ethical and legal:
|
| Disney et al, 2017 | Presentation of the CS project tool, anecdata.org – an online platform for CS project to collect, manage and share environmental data. | Works as a repository to share and download data openly. May be connected to SciStarter.com in the future. Apparently does not support other RDM functions than data storage and sharing. |
| Forest Service, 2018 | A guide from US Forest Service for CS projects in order to make data of good quality available to the agency. Chapter 4 mentions DM shortly. | Data should be made available to Forest Service staff. |
| Greshake Tzovaras et al, 2019 | A new platform, Open Humans, is presented. The platform is open for personalised data collection (e.g. health data), but allows participants to control sharing. The platform can be used for CS and academic research. | The article present challenges for participatory science within humanities, sociology and medicine: – Accessing data in commercial environments (e.g. apps) – Health data are stored in “silos”, e.g. managed by national institutions –Ethical concerns over use of personal data Participants can upload data collected elsewhere and manage which projects on Open Humans that can access the data. Data can be re-used in as much as possible under the control of the participant. Members share notebooks (code for data analyses) that allows analysing the individuals own data, i.e. notebooks are interoperable and reusable The open source for the platform has allowed communities to write own expansions and data importers. |
| Heigl et al, 2018 | The CS Network Austria has defined a set of quality criteria for projects wishing to be listed on the Austrian CS platform, Österreich forscht. The criteria are also formulated as questions, which project leaders must answer. Platform coordinators and a WG read the answers and provide feedback and support if deemed necessary. Criteria relevant for RDM are listed here. | FAIR: – All data and metadata is made publicly available, provided there are no legal or ethical arguments against doing so. – The results are published in an open-access format, provided there are no legal or ethical arguments against doing so. – The results are findable, reusable, comprehensible and transparent. RDM: – Prior to data collection, all projects must have established a data management plan which conforms to the European General Data Protection Regulation Ethical and legal issues: – The project must follow transparent ethical principles in compliance with ethical standards, such as obtaining informed consent from participants or the parents of participating children, among others. – Clear information on data policy and governance (regarding personal and research data) must be published within the project, and participants must consent to this information prior to participation. |
| Parthenos | An online course/resource for CS in (digital) arts and humanities. One module focuses on DM planning of CS or crowd-sourcing projects. Additional modules deals with research infrastructures and ethics | Recommendations: – Know what you data will be, and how you will use it, to ensure you are compliant with GDPR and ethical standards – Use appropriate standards to model your data – Use a data management plan to help structure your thinking |
| Pettibone et al, 2016 | A guide for practitioners on citizen science as practised in Germany. One chapter is on data and legal considerations. | Data should be secured for long-term use in permanent infrastructure Data rights must be determined. Reusability must be ensured through clarity of data and use of appropriate metadata. DM must be transparent and comply with legal requirements. Ethical and legal issues: The legal framework must be in place, considering copyright, data rights, privacy, personal data and relevant legislation (e.g. laws for protection of the environment) |
| Sturm et al, 2018 | Recommendations from workshops on principles for mobile apps and platforms in CS projects. It is acknowledged that the recommendations can be used for CS projects in general. | The workshop identified and provided recommendations for RDM challenges related to securing interoperability and data management: Index apps and platforms to facilitate reuse. Data sharing and use of open source for code base is encouraged. Consider data privacy. Use standards for software design and for data and metadata. Use UUID for all observations and data points. For reuse of apps and platforms, include metadata for license, documentation and modifications. Provide technical support for the app/platform. Recommendations on securing sustainability of the project, data protection, participant privacy and IPR (incl. national/regional differences) are also provided. |
| Tweddle et al, 2012 | A guide to CS written on behalf of the UKEOF, i.e. directed at environmental sciences. A few advices on RDM is included. | Store data in well-known repositories. Make data available electronically. Data sharing with relevant organisations is encourage, since they often can provide data storage. Ethical and legal issues: IPR and data protection requirements must be considered. |
| UKEOF’s Advisory Group, 2013 | A pamphlet that shortly explain seven principles to ensure quality data and good data management of CS projects. | Consider the data requirements Manage volunteers to get the best data Ensure data quality Harness new technologies Manage data effectively Report and share data Evaluate to maximise data value |
| US EPA, 2019 | Handbook by US EPA that addresses how to ensure quality, documentation and data management of CS projects. | The handbook contains detailed – advices and templates for documentation and data reuse – advices and a template for writing a DMP |
| US GSA | A short toolkit from the U.S. federal government on managing CS data | |
| Wang et al, 2015 | Presentation of the CS project tool, CitSci.org | CitSci.org is a customizable platform that allows users to collect and generate diverse datasets. It contains standardised metadata necessary for data exchange and quality assurance. A web-based DM feature is included in tool. The tool includes documentation of permissions, privacy and security of information. |
| Wiggins et al, 2013 | DataOne WG report on introduction to data management of CS projects. The report function as a tool for RDM. | The document – introduces the data life cycle –provides best practices and recommendations in each step of this life cycle –identify key opportunities and challenges in DM |
| Wolf et al, 2019 | ONC is university-based and operates ocean observatories and repositories services. ONC has developed a DM system and the article presents how ONCs best practices and services for DM is applied to a CS project in the entire data life cycle, rendering CS data FAIR. | The document describes how ONC implements best data management practices throughout the data life cycle. Can be used as a tool/guideline for RDM. |
[i] a Abbreviations. CS, citizen science; DM, data management; DMP, data management plan; RDM, research data management; IPR, intellectual property rights; OCN, Ocean Networks Canada; UKEOF, the UK Environmental Observation Framework; US EPA, United States Environmental Protection Agency; WG, working group.
Table 4
Information about projects in case study.
| PROJECT TITLE (TRANSLATED) | HOMEPAGE AND START YEAR | PURPOSE | CITIZEN SCIENTIST | RESEARCHERS | DISSEMINATION TO THE PUBLIC | |||
|---|---|---|---|---|---|---|---|---|
| Prerequisites | Involvement | Outcome | Benefits from using citizen science method | Outcome | ||||
| Fyn finder marsvin (Funen finds harbour porpoises) | https://www.sdu.dk/da/forskning/forskningsformidling/citizenscience/fyn+finder+marsvin 2019 | Distribution of harbour porpoises in the inner Danish waters: Spatial, seasonal, and females with young cubs. | All persons with a cell phone. | Observations collected via mobile app. | The participant will get an understanding of how many resources population registration requires by conventional scientific method. Learn about harbour porpoise biology. | Large spatial coverage and large data volume | Publicity in the media. Research data, merit, and a basis for management and conservation | Website with observations data on university and partner website. Radio interviews and articles in popular science magazines. |
| Livet med demens (Life with dementia) | https://www.sdu.dk/da/forskning/forskningsformidling/citizenscience/lidem 2019 | The purpose is to create a centre for dementia, under which research projects can be developed and run in collaboration with citizens, professionals, municipalities and scientists. | Patients with dementia, their relatives, caretakers and other professionals can participate. | The participants’ knowledge on how to live a life with dementia will be actively used. | Larger inclusion of relatives and caretakers. Increased quality of life for relatives and patients. Better treatment of patients. | More knowledge about what works best, to increase the quality of life for both patients and relatives. To put dementia on the political agenda. | New methods will be tested and documented in order to create better treatment and increase the quality of life. | Physically by small theatre productions, material for website and directly to participating municipalities. Scholarly publication and conferences. |
| Fangstjournalen (CatchLog) | https://fangstjournalen.dtu.dk/ 2016 | Better knowledge on fish populations in Danish waters. | All persons with cell phone and/or web access with an interest in fish and aquatic environment. | Collect information about fish from fishing trips via app or browser. Collect observations e.g. about large mammals from aquatic environment. | Logbook of own fishing trips, possibility to show catches to others. The app gives information about current location fishing restrictions. | Data could not be obtained by other methods and provide large spatial coverage and data volume. | Research data, merit, and a basis for management and conservation | Continuous publication of news and data on website and facebook. Scholarly publications and conferences. |
| Masseeksperiment 2019 (Mass Experiment 2019) | https://naturvidenskabsfestival.dk/tildinundervisning/masseeksperiment-2019-plastforurening-i-vand 2019 | Distribution of plastic litter in the Danish terrestrial environment. | School and high school children (grades 0-9 and 10-12 in DK). | Collect, classify, and count plastic litter | Can be part of school teaching curriculum: Insight into the problem of plastic pollution in the Danish environment. | Large spatial coverage and large data volume. | Research data and merit. | Report is published and a scholarly paper is submitted. |
Table 5
Solutions and challenges with research data management and infrastructures, FAIR and ethical and legal issues. Data is extracted from interviews with the principal investigator of projects in case studya.
| PROJECT TITLE (TRANSLATED) | RESEARCH DATA MANAGEMENT AND INFRASTRUCTURES | FINDABILITY | ACCESSIBILITY | INTEROPERABILITY | REUSABILITY | ETHICAL/LEGAL ISSUES |
|---|---|---|---|---|---|---|
| Fyn finder marsvin (Funen finds harbour porpoises) | There was no initial intention to write a DMP, though the university’s Open Science Policy mandates one. PI not aware of the FAIR principles. | Results can be found through the project homepage, and in an open repositoryb. A DOI and simple administrative metadata are assigned to the data in the repository. | All sightings available through website. The full data set is uploaded to Zenodo at intervals. | Data and metadata are not defined by ontologies. Data consist of the porpoise sightings (date, number and location), are of very simple structure and can be downloaded in csv format. | Data are published in Zenodo under the CC BY 1.0 license, but are not accompanied by provenance documentation. | Only locations for porpoise sightings are shared, data do not contain any personal information. |
| Livet med demens (Life with dementia) | DMP may be written for individual projects. The centre is currently developing activities. PI not aware of the FAIR principles. | Some data could be made available, but of course not patient data. | Patient level data are highly sensitive. Mapping data showing how municipalities are working with patients can be shared. There are also qualitative “data” that could be shared with consent. | |||
| Fangstjournalen (CatchLog) | To write a formal DMP was not a recommendation at the time of project start. A DMP would have been useful. Data structure not initially designed for a repository. PI not aware of the FAIR principles and the institutional data repository. | Aggregated results can be found through the app and project homepage, but data not available in an open repository.Currently no PID or administrative metadata are assigned to the data. (A metadata record is available in an open repository since 2021.c) | Data are stored in local database. Datasets can be shared as a copy after cleaning for personal data – no direct access to data. | Some standards are used for structural metadata and data formats. Machine readable identifiers are not assigned to data. PI has suggested a standard for angler projects.d | PI sees great potential with merging data from other aquatic and environmental sources. Data quality is high and documented, but not publicly available yet. Manual work needed for data cleaning and assigning metadata before any kind of sharing. PI interested in sharing and licensing data through the institutional repository, but with embargo until results have been published in scientific articles. | GDPR is a major issue – as the ‘fear’ of breaking GDPR rules hinders the willingness/courage to share data. Processes for anonymising data before publication/sharing needs to be defined and cleared. |
| Masseeksperiment 2019 (Mass Experiment 2019) | To write a DMP was not a recommendation at project start, but would have been useful. Data structure not initially designed for a repository. Raw data stored at Astra (the national Centre for Learning in Science, Technology and Health in Denmark). PI not aware of the FAIR principles. | When an article presenting the results was submitted, data were uploaded to Zenodo and DOI and metadata were added.e | Data published in Zenodo,c however with personal data removed (GPS coordinates, school names etc.). | Currently no known standards for this type of data (format, metadata) except that plastics were classified according to.f | When data is published in an open repository, the datasets will be kept as original as possible but with anonymization. The data are published as an Excel file with no provenance information under the CC BY 4.0 license. | No personal data involved. School class data and spatial data (GPS coordinates) are removed. |
[i] a Abbreviations: DMP, data management plan; DOI, digital object identifier; PI, principal investigator; PID, persistent identifier. b (Wahlberg, 2020). c (Skov, 2021). d (Venturelli, Hyder and Skov, 2017). e (Syberg, 2020). f Annex 1 in (Hanke et al, 2020).
