Incorporating Open Data Philosophies into the Academic Sector

Pål Gunnar Ellingsen

doi:10.5334/dsj-2025-019

Full Article

Introduction

Over the last years there has been a drive to ’FAIRify’ data both in the public sector (EC, 2019) and the industry (van Vlijmen et al., 2020). This ’FAIRification’ is based on an open interpretation of the FAIR guiding principles (Wilkinson et al., 2016), which goes beyond the original principles which are limited to the openness of the metadata, and thereby allows for withholding (paywall or sensitivity). In the rest of this paper, FAIR will be interpreted using the open interpretation, meaning that it should result in the publication of open metadata as well as data. Open data has been implemented by a large number of data centres and organisations, for instance, the Global Biodiversity Information Facility (GBIF) (GBIF, 2020) and the Ocean Biodiversity Information System (OBIS) (UNESCO-IOC, 2020). These have focused on how to gather metadata and data into a FAIR framework and share it, together with training sessions for contributing scientists. In addition, there are initiatives like GO FAIR (GO FAIR, 2020) and the GO FAIR Foundation (GO FAIR Foundation, 2025), which provide a coordinated effort to achieve adaptation, training, and development of standards.

Within publishing the move to open access was accelerated by the adoption of Plan S by a group of national science funders, known as Coalition S, in 2018 (Schiltz, 2018). These organisations represented countries from all over the world, though with a majority located in Europe. Plan S was to be implemented from 2021 and required all publications to be openly accessible. The adaptation of this policy was a major undertaking and a shift in the view of scientific results as something everyone should have access to. This shift has not been completely without issues, especially with incentives shifting in journals as they are paid to publish and not by readership (Wold, 2024).

Both Plan S and the goal of open metadata (FAIR) and data come from an expectation that publicly funded science and data should be freely available to other scientists and the public. Their implementation within education and research institutions has uncovered institutional weaknesses and a resistance to change. In this paper, I will argue that there needs to be a broader change, especially within universities, in order to change the way new students and future researchers think and work. The focus will be on the implicit premises universities make when deciding on which software and systems to provide. This will be based on experiences from the Norwegian university sector but, due to the proficiency of the platforms in question, it will be applicable to most universities.

Background

Universities are responsible for educating the next generation of highly skilled workers and researchers. Through university education, students learn everything from ethics to the subject-specific knowledge needed in future work, be it industrial, governmental or research. Some of this knowledge involves the use of different software and hardware tools, as well as how to record data and metadata. On completion of their studies, they have established preferred work methodologies and (software) tools, and they bring these preferences into their new workplace, be it industry, university, government, etc. The software industry is well aware of the value of this preference, as is evident from the favourable licensing terms given both to universities for education and to individual students. Often these are proprietary software, producing files and data in proprietary formats with little or no documentation.

In order to ensure that academics are following Plan S for their publications and the FAIR principles when publishing open (meta)data, universities and funding organisations require scientists to comply with certain requirements. These requirements include which publishing channels are acceptable, the use of data management plans and publishing FAIR-compliant metadata and open data where possible. When this was implemented in Norwegian universities, it was in most cases the university library that was given the responsibility. I would argue that this was a missed opportunity to FAIRify the whole university instead of assigning it to a support function with little experience in handling scientific data. One of the largest missed opportunities was around which software tools are offered to the students, as open source and FAIR requirements are not priorities in Norway when acquiring software for universities.

In the next subsection, the implications of the choices made with respect to software will be investigated. The Norwegian university sector will be used as an example, but it is applicable outside of Norway.

Software and Open Source

Within a university, there is a large amount of software in use. Some of the specialised software is for management (HR, procurement, etc.), while the rest is used in production (education and research). The focus here will be on software choices made by the university that affect the teaching environment, its students, and research. As examples, I will use software that is in use at my university, UiT The Arctic University of Norway. The examples given will be applicable to the other large universities in Norway, as the large universities have a common software supplier, and also internationally, as the software products are supplied by large international companies.

Currently a lot of proprietary software has been chosen in the universities, examples being Microsoft Windows, MacOS, Microsoft Office 365, Zoom, Panopto, AutoCAD, Solidworks, Matlab, LabView, Adobe CC and Wiseflow. The lack of source code for these pieces of software is not in itself a major problem, but they are often accompanied by file formats without proper documentation, making it hard to reuse them in other applications. As reuse and interoperability of data are key points in the open interpretation of FAIR, these formats cannot be used to store open FAIR data.

The problem of producing files with poor interoperability and reusability is especially apparent when looking at the open document file format Open Office XML (OOXML). Microsoft chose to implement a version containing proprietary parts (transitional OOXML, also called ’MS-OOXML’) (FSFE, 2021) and use it as default instead of the open version, known as the strict version. Over time this has meant that the transitional version has become the default, leading to interoperability problems, and also the strict format is not being widely used (LOC, 2020; LOC, 2021a; LOC, 2021b). The problem is acknowledged by the Norwegian Digitalisation Agency, amongst others, in its implementation of the requirement for publishing documents in the public sector (Digdir, 2021). As a consequence, per the requirements, they are discouraged for use in university communication, both via web pages and in teaching. A better choice would be to use Open Document Format (ODF).

The issue with the confusion and different formats in the OOXML standard is that it leads to interoperability challenges (FSFE, 2021; LOC, 2021b). Interoperability is one of the four pillars in FAIR and one of the harder parts to implement. In its choices, by supplying software to students and staff, and at the same time not supporting open alternatives, academic sector is ensuring that additional work and training will be needed to publish open FAIR-compliant data. If instead, the university chose to use formats that were FAIR-compliant in day-to-day work, students and researchers would be familiar with them when publication time comes, both when publishing research and data. To further improve the understanding of FAIR, it would be advantageous to highlight why these formats are FAIR and open when implementing them in the sector.

In the more technical fields (CAD, modelling, data processing, etc.), there might be reasons related to software capabilities why certain tools have been chosen, though tradition and corporate pressure from software vendors are also factors. There exist many open alternatives that could be implemented without losing the desired capability, like, for instance, FreeCAD and KiCAD, which are used in the industry. An example of a successful shift has come with data science, which in large part has moved from proprietary software (MATLAB, IDL, etc.) to Python and R workflows using open libraries and Jupyter notebooks.

Within learning management systems (LMS), there are noteworthy exceptions like Canvas and Moodle. Canvas is in use at most universities in Norway through Sikt, meaning that students and employees have access to the source code on GitHub under an AGPLv3 license (some plugins are proprietary). This means that if students want to understand how the LMS works, its code is available, and they can try and run it themselves. Also, after graduation they can install and use it in any future work without necessarily having to sign a licensing agreement. The availability of the code means that even if the company (Instructure) ceases to exist, it would still be possible to update and run the code. However, due to the proprietary parts of the software, data from them can not necessarily be exported in an open way.

Policy Recommendations

The academic sector has a great opportunity to inspire new generations of scientists and graduates to produce open science. Some steps have so far been done with Plan S and open FAIR data, though the push has come mostly from funding and government agencies and not from the sector itself. Therefore, it is about time that the academic sector understood that the choices it makes have wide-reaching consequences. If the true intention behind the push for open science is to be realised, the effort needs to encompass the whole sector and not only select parts. This means looking at which choices are made for IT systems, software and educational training. If non-FAIR and proprietary solutions are used and procured, that is what most academics and students will use, as the effort needed to set up and maintain their own, non IT supported system, is prohibitively hard for most. When these systems then are not supported by central IT solutions, it becomes even harder for the academics wanting to embrace open science in every aspect of the research and teaching. Most tradesmen are defined by the tools they have available and use, and the same goes for academics and students. One way to solve this could be to ensure that the sector:

Strongly prefers procurements of software that produces FAIR-compliant metadata and open data
Prefers giving training and support for open source tools to students
Pushes for an open science approach that uses software freely available to the public

These choices will then have the possibility of propagating into the rest of the society as future students will understand the importance of FAIR metadata, open data and open tools. By using software in open science that is freely available, openness and innovation are improved, as the software is transparent. It is hard to check any published results if the software used in the analysis is expensive, proprietary and a black box.

How to Implement

How should the academic sector, organisations and the government ensure that the open science mindset propagates into the university and from there into society as a whole?

The first decision to FAIRify—in the open data meaning—the universities is a decision that needs to be taken by university boards and/or at the government level. The shift needed is significant, but so was the move to FAIR data and Plan S. By completing the transition, the upside is very large. With FAIR metadata and open data produced by open source software, it will be a lot easier to share data, innovate and validate scientific results. In addition, there would be some cost savings available, as vendor lock-in would be less of a concern. Vendor lock-in is a long-standing and well-known issue within, for instance, cloud computing (Opara-Martins, Sahandi and Tian, 2014; Opara-Martins, Sahandi and Tian, 2016), with the latest example being the turmoil after Broadcom acquired VMWare in 2023 (Bhatia and Gabhane, 2025). Lastly, if universities use their knowledge to fill open software gaps instead of paying for proprietary software, the software can find commercial uses far outside of the universities and in poorer countries, improving the lives of many.

Acknowledgements

I would like to acknowledge Dr. Guy Jones for valuable reflections upon an earlier version of the paper, and I would want to thank the reviewers for their input.

Competing Interests

The author has no competing interests to declare.

Author Contributions

Pål Gunnar Ellingsen has written the whole paper and done the research.