Skip to main content
Have a personal or library account? Click to login
QFlowCrate: A QGIS Plugin for Workflow Documentation and Provenance Capture to Enhance Geoscientific Reproducibility Cover

QFlowCrate: A QGIS Plugin for Workflow Documentation and Provenance Capture to Enhance Geoscientific Reproducibility

Open Access
|Jun 2026

Full Article

(1) Overview

Introduction

Reproducibility is the ability to obtain consistent results across studies using the same data and code [1, 2]. It is a cornerstone of scientific research because it enables the verification of findings and the reuse of study materials. However, conducting and publishing reproducible research is neither easy nor common. One of the primary obstacles to successfully reproducing research results is the presence of issues related to data and documentation. Examples of such issues include unavailable data files, inadequate documentation of processing operations, ambiguous execution instructions, hard-coded file paths, and reliance on proprietary software [3, 4, 5, 6]. The domain of geosciences is no exception to these challenges [7, 8, 9]. Provenance—the documented history of data items—offers a promising solution to these obstacles [10]. By capturing details about the data origins, the processes used to create research outputs, and the individuals or organizations involved in their production, provenance enables the evaluation of both credibility and quality of data workflows [11]. Furthermore, provenance effectively addresses reproducibility problems by creating a transparent, traceable record of data sources, processing steps, and computational environment parameters, thereby filling the documentation gaps that often cause reproduction failures.

Geoscientific workflows are usually complex, combining many software platforms for data management, statistical analysis, geoprocessing, and visualization [12]. While researchers are adopting more reproducible practices such as literate programming (Jupyter notebooks, R Markdown) [13] and containerization [14], these approaches require additional effort and expertise. Additionally, many researchers rely entirely on desktop Geographical Information Systems (GIS) for geospatial data analysis, where they interact with the user interface primarily through point-and-click actions. Maintaining a detailed record of workflow steps can be challenging when using desktop GIS because the history of data transformation steps is not explicitly documented, as opposed to scripted workflows. Despite the growing trend towards programming-based methodologies, desktop GIS remain a popular choice for data processing in the geosciences, particularly for visualization tasks [8], thanks to their intuitive interface and ease of use.

QGIS [15, 16] is one of the most used GIS platforms, being extensively adopted across geoscientific subdisciplines. It is open-source, free-to-use, and its plugin ecosystem provides opportunities for workflow enhancements. Given the lack of built-in capabilities for comprehensive workflow documentation and provenance tracking in QGIS, we developed a plugin to capture data provenance and the sequence of geospatial operations performed with QGIS. The plugin can capture all operations listed in the Processing History of QGIS, including those executed using the Processing Toolbox and the Model Designer. With this plugin, we aim to help QGIS users improve the quality of their workflow documentation, as it automatically records all data sources, processing operations, and computational parameters, thereby minimizing the need for manual documentation and providing a thorough, machine-readable record that can be used for reproduction.

QFlowCrate exports the captured provenance information using the format of Research Object Crate (RO-Crate) [17]. RO-Crate is a simple, community-driven standard for packaging research data, which is gaining popularity across disciplines and technological ecosystems, and has been extended to capture the provenance of the execution of computational workflows [18]. The RO-Crate model prioritizes practical adoption over complexity, using simple JSON files rather than complex Semantic Web structures. Moreover, its primary focus is provenance description, letting existing technologies such as ZIP files handle data packaging [19, 17]. RO-Crate uses standardized identifiers, such as HTTP Internationalized Resource Identifiers (IRIs), Digital Object Identifiers (DOIs), and Open Researcher and Contributor IDs (ORCIDs), to reference data items, individuals, and other relevant resources. It also adopts terms from schema.org, mapped to the World Wide Web Consortium’s provenance standard, to describe object types and their relation [11]. In doing so, RO-Crate aligns with the FAIR (Findable, Accessible, Interoperable, Reusable) data principles [20], as it enables the creation of detailed metadata, supports standardized identifiers, and facilitates the sharing and reuse of research objects [21]. By building on these principles, QFlowCrate uses RO-Crate to export geospatial workflows developed in QGIS, aiming to enhance provenance tracking and, consequently, the reproducibility of these workflows.

Implementation and architecture

The software is implemented as a QGIS plugin that enables semi-automatic documentation of geospatial workflows by leveraging the extensibility of the open-source QGIS platform. It is written in Python 3.12 and uses the PyQGIS API to integrate directly with native QGIS functionality, including access to map layers, Processing History, and symbology handling. Dependency management is handled using uv (v0.8),1 enabling reproducible builds. The graphical user interface (GUI) is implemented using PyQt5 (5.15), ensuring consistent behavior across Windows, Linux, and macOS. Compatibility testing confirmed correct operation with the latest long-term QGIS releases (3.40, 3.44).

User workflow

The user can interact with the plugin as shown in Figure 1. After the user has created a geospatial workflow, they can use this plugin to document it. When opening the plugin, the user initially sees the instruction page. If using the plugin for the first time, they can read these instructions. If already familiar with the plugin, they can continue directly to the Graph tab.

Figure 1

Activity diagram showing the user workflow for exporting a RO-Crate through the plugin’s interface.

In the Graph tab, users can add layers, processes, and connections. When adding a layer, a pop-up opens, which lists the layers that have been loaded into the QGIS project. After the user selects one of the listed layers, the pop-up requests that they enter the metadata. The pop-up asks for a description and whether the layer sources from an external dataset. If so, the user must enter the source details, including the title, URL, date of acquisition or download and any additional comments. If this information is available in the layer properties, the relevant fields will be filled in automatically. After the required fields are complete, the layer is added as a node to the graph. Similarly, when adding a process, another pop-up opens listing the geospatial operations that have been executed throughout the project’s history. The pop-up automatically fills in the input and output files, units of measurement and projection, and requests a title and description. Once entered, the process is added as a node. Users may then add connections between layer and process nodes, which automatically sets the process’s input and result parameters in the background.

When no more elements need to be added to the graph, the user continues to the Export tab, where project metadata must be entered, including author information, license, and project title and description. After selecting a file path for the RO-Crate archive, the user clicks the export button. The user receives an informational pop-up indicating whether the export was successful or an error was encountered. Once the export is successful, the RO-Crate is created as a ZIP file.

The Import tab offers the option to import existing RO-Crates created with QFlow-Crate. Users can browse their files, select a RO-Crate ZIP file, and visualize it as a graph. There is also an option to inspect the details of the graph elements. Currently, re-execution of RO-Crates is not supported.

Example use case for exporting RO-Crates

Suppose we have the following example. We have a QGIS project consisting of a locally stored shapefile containing point data on school locations in Münster, North Rhine-Westphalia, Germany, as well as a Web Feature Service (WFS) connection to a dataset on the city of Münster’s administrative districts. We have also added an OpenStreetMap layer as the base map. We want to find which schools are located in the central district of Münster using the first two data layers. To do this, we used the Processing Toolbox to extract the central district polygon from the WFS layer as a temporary layer. We then extracted the data points from the school shapefile within this district as a temporary layer.

We want to document this simple workflow using QFlowCrate. The process is shown in Figure 2. First, we open the plugin and navigate to the Graph tab, where we click on the ‘Add Layer’ button. From the pop-up displaying all the project’s data layers, we select the four relevant layers, excluding the OpenStreetMap base map, as it is not involved in any geoprocessing operations. For each data layer, we need to fill in a title and a description. The layer properties and technical information are pre-filled and cannot be edited. There is an option to indicate whether the layer comes from an external source. The added data layers are then displayed as blue rectangles.

Figure 2

Storyboard illustrating an example use case for creating and exporting RO-Crates using QFlowCrate.

Then, we can add the processing steps that have been applied to the data layers. Similarly, clicking the ‘Add Processing Step’ button retrieves a pop-up containing all the processing steps in the history of the QGIS plugin. We select a processing step, fill in the metadata and note that some information has already been filled in. The processing steps are now visualized in the graph as green ellipses.

Once we have compiled all the data layers and processing steps, we can activate connection mode and use arrows to link them. We can also rearrange them if necessary. When we have the entire workflow in the graph, we can navigate to the Export tab, enter the author’s details and the project metadata, and export the RO-Crate. This will create a ZIP file containing the RO-Crate metadata JSON file, as well as a folder for each of the data layers involved, in the specified location. The local shapefile is copied as it is, the temporary files are copied as GeoJSON, and the WFS is skipped because it is loaded externally. For the layers that are set as visible in the layer pane, the symbology is copied as a QML file. If raster layers are used, they will be exported in their initial format (TIF, PNG, JPEG, etc.) in the RO-Crate.

Architectural overview and design rationale

The plugin follows a modular, layered architecture with clear separation between user interaction, internal provenance representation, and metadata export. The user interface is implemented as a single QGIS dialog window divided into four tabs: an instruction tab, a graph-based documentation tab, an export tab, and an import tab. This structure minimizes cognitive load while reflecting the iterative nature of workflow documentation, where descriptive metadata (e.g., author, license, project title) is typically finalized only during export.

Workflow documentation is supported through a graph-based visualization inspired by established GIS paradigms, such as the QGIS Model Designer [22] or the ArcGIS ModelBuilder [23]. In contrast to prospective workflow construction approaches, the plugin captures workflows retrospectively, after standard QGIS operations have been performed. Layers and processing steps are represented as nodes connected by directed edges, forming a tree-like structure that makes data dependencies and processing relationships explicit. Users can interactively arrange nodes via drag-and-drop, while metadata enrichment is supported through pop-up dialogs when adding layers or processes.

Core components

The software is organized into five primary modules:

  • Instruction module: Provides the InstructionTab class, which displays usage guidance and documentation requirements.

  • Graph module: Serves as the coordination layer for workflow documentation. It contains the GraphTab, GraphView, LayerNode, ProcessNode, and ConnectionArrow classes, which manage the visual representation of workflows and their relationships.

  • Layer module: Encapsulates layer-related provenance information in a dedicated Layer class. A LayerFactory implements the Factory pattern to abstract the creation of heterogeneous layer types (e.g., file-based layers, WMS layers, memory layers).

  • Process module: Represents geoprocessing steps using the Process class, which stores tool identifiers, parameters, execution logs, and input/output layer references. Processing instruments are modeled explicitly using the Instrument class.

  • Export module: Provides the ExportTab class and implements the central exportROCrate function. This module acts as a facade, orchestrating the export workflow while delegating layer- and process-specific serialization tasks to the respective modules.

  • Import module: Contains ImportTab, ImportedProcess, and ImportedLayer classes, which enable the import and visualization of previously exported or shared RO-Crates.

This separation of concerns supports independent testing, simplifies maintenance, and enables future extensions, such as additional provenance formats or data source types. A class diagram of the software architecture is shown in Figure 3.

Figure 3

Class diagram of the QGIS plugin architecture. Class names, attributes, and methods are simplified for clarity and readability. PyQt inheritance relationships are indicated through stereotypes to maintain diagram clarity. «mocks» indicates that ImportedLayer and ImportedProcess do not inherit from Layer and Process, but mimic their expected properties so that LayerNode and ProcessNode can interact with it seamlessly.

RO-Crate export and implementation variants

Workflow documentation is exported using the RO-Crate standard, selected for its alignment with FAIR principles and wide adoption in computational research. The plugin follows the Process Run Crate profile [24] to represent computational workflows, capturing both data lineage and processing activities.

The exportROCrate function uses the ro-crate-py library (v0.14) [25] to generate the crate structure. Project-level metadata (author, license, title, description) is recorded first. Documented layers are then added as Dataset entities, each grouping geometry and symbology files. Symbology is exported as QGIS QML files, while geometry files are copied directly into the crate where available.

QGIS memory layers, which do not exist as files on disk, require special handling. These layers are temporarily serialized to file-based formats before inclusion in the RO-Crate archive.

Processing steps are serialized as CreateAction entities enriched with execution parameters, logs, and both QGIS processing commands and equivalent Python commands. Unique processing tools are represented as SoftwareApplication entities to avoid duplication.

At the time of development, the ro-crate-py library did not fully support the RO-Crate 1.2 specification. Consequently, the exported archive is modified post-generation to incorporate the Workflow Run Crate context and custom vocabulary mappings within the ro-crate-metadata.json file.

Performance characteristics

The export procedure exhibits linear time complexity,

O(L+S+P+T),

where L is the number of layers, P the number of processes, T the number of unique instruments, and S the total size of files copied. The dominant cost typically arises from file I/O associated with copying large geometry and symbology files.

Quality control

Our quality control strategy follows a threefold approach consisting of a functionality testing suite for individual modules, user testing, and validation of exported RO-Crates. This modular testing strategy allows us to identify problems early in the development process and trace bugs to their originating modules.

To ensure that the plugin functions as intended, we developed a suite of unit and component tests using the pytest framework.2 These tests were designed to run within a Python environment with access to the QGIS installation, ensuring compatibility with the actual QGIS runtime behavior, and to use QGIS’s built-in testing infrastructure (qgis.testing library). The suite tests the functionality of the different tabs (Instruction, Graph, Export, Import) in isolation, verifying the layout initialization, user interface (UI) components, and core behavior of each tab. This testing approach provides a first layer of quality assurance for each module individually and helps catch logical errors early in the development process.

For the user testing, we recruited 10 students who study geoinformatics and have experience with QGIS. Each participant received a package containing the developed QGIS plugin as a ZIP archive, detailed instructions for plugin installation and task assignment, and an HTML form containing the System Usability Scale (SUS) questionnaire [26]. The participants were instructed to create a map according to their own preferences and ideas in QGIS, document the map creation workflow using QFlowCrate, export the documentation as an RO-Crate, and then complete the SUS questionnaire. Upon completion, participants shared with us their responses and the generated RO-Crate ZIP archives. For reference, the calculated SUS scores ranged from 70 to 95 points with a mean of 81.75 (SD = 7.46), which indicates that QFlowCrate has a high possibility of user acceptance [27, 28].

To ensure the technical correctness and practical viability of the plugin’s RO-Crate export functionality, we assessed compliance with RO-Crate specifications and export performance across the different workflow complexities of the users’ RO-Crates. The validation of the exported RO-Crates was performed using the rocrate-validator tool [29] against the base RO-Crate profile and the Process Run Crate profile. The validation results showed that all of the RO-Crates were valid with no structural or formatting issues. Export durations range from 0.14 to 15.60 seconds, corresponding to RO-Crate file size between 42 KB and 55.9 MB. The data shows a trend of increasing export duration with larger file sizes, consistent with the theoretical linear time complexity O(L+S+P+T). Finally, we ensured that the export and import functionalities of QFlowCrate are aligned by importing the user-generated RO-Crates and inspecting their graphs.

(2) Availability

Operating system

Windows 11, macOS 26, and Linux Ubuntu 22.04 or later. Minimum compatibility: QGIS 3.24 or later. The plugin was systematically tested with 3.40 and 3.44 in all three OS.

Programming language

Python (version 3.12 or greater)

Additional system requirements

The software runs as a QGIS desktop plugin and therefore inherits the system requirements of QGIS. No additional hardware requirements beyond those of QGIS are imposed. Disk space requirements depend on the size of exported RO-Crate archives, which may include large spatial datasets.

Dependencies

  • QGIS (minimum version 3.24)

  • PyQGIS API (bundled with QGIS)

  • PyQt5 (minimum version 5.15)

  • ro-crate-py (v0.14.0)

  • ro-crate-validator (v0.7.3)

  • Standard Python libraries (os, pathlib, datetime, tempfile, re)

List of contributors

  • Andreas Rademaker – Main contributor (code, writing)

  • Eftychia Koukouraki – Contributor (concept, writing, code)

  • Brian Pondi – Contributor (validation, writing)

Software location

Archive

Code repository

Emulation environment

The software runs natively as a QGIS plugin and does not require an emulation or container-based execution environment.

Language

English.

(3) Reuse Potential

QFlowCrate addresses a significant gap in reproducible research tools available to desktop GIS users. It enables geospatial researchers and geoscientists who are not yet confident in programming to document the provenance of their data for publications and research outputs, and does so using an entirely free, open-source solution. The results of our user study suggest that such a semi-automatic approach of capturing data provenance may succeed where manual documentation has failed, not by requiring greater researcher discipline, but by reducing the perceived burden of documentation. Nevertheless, we would like to encourage researchers to view this approach as a means to an end, eventually aiming to build their geoprocessing workflows with scripts or computational notebooks to achieve the gold standard in terms of reproducibility.

The current functionality of the plugin provides a suitable foundation for future expansion. The documentation scope could be extended beyond its current Processing History-dependency to include operations performed through the GUI that are not automatically captured. Support for multi-software workflows, where data passes between QGIS and external tools like R, Python scripts, or web services, would address the reality that many geoscientific workflows span multiple platforms. Future development could also leverage RO-Crate’s full potential through bidirectional workflow management. The import functionality would be extended to enable automatic workflow re-execution of RO-Crates, serving both general GIS users seeking workflow automation and researchers requiring standardized documentation.

Integration with external data repositories and persistent identifier systems, including automated DOI registration and institutional repository linking, would enhance long-term data management capabilities. We welcome community contributions and provide support through the GitHub repository of QFlowCrate at https://github.com/nicevibesplus/QFlowCrate. Users can report bugs and ask questions by opening an issue. Developers interested in contributing code can fork the repository and submit pull requests.

Notes

Acknowledgements

The authors would like to thank Prof. Christian Kray and Prof. Edzer Pebesma for their constructive comments on the manuscript.

Author Contributions

Andreas Rademaker: Software, formal analysis, visualization, methodology, data curation, writing – original draft.

Eftychia Koukouraki: Conceptualization, validation, supervision, writing – original draft.

Brian Pondi: Validation, supervision, writing – original draft.

DOI: https://doi.org/10.5334/jors.704 | Journal eISSN: 2049-9647
Language: English
Page range: 44 - 44
Submitted on: Feb 18, 2026
Accepted on: May 21, 2026
Published on: Jun 5, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Andreas Rademaker, Eftychia Koukouraki, Brian Pondi, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.