Introduction
‘Oxford aims to streamline processes to reduce administrative load while maintaining compliance and maximising research dissemination.’
The Oxford University Research Archive (ORA)1 serves as an institutional repository (IR) to collect, preserve and disseminate the research output of University of Oxford (Oxford) members. To be able to build a complete picture of the publishing landscape at Oxford, researchers must deposit their work to ORA or provide necessary information for linkage and reporting within the University’s research information system – Symplectic Elements (Elements).2 As part of the process researchers and repository staff must ensure that research outputs adhere to institutional policy,3 funder mandates and other requirements like publisher embargo policies and the Research Excellence Framework (REF).4
Addressing policy gaps and researcher engagement
In alignment with REF 20215 and funder mandates, Oxford initiated an ‘Act on Acceptance’ policy, urging researchers to deposit accepted manuscripts of journal articles and conference papers to ORA at publication acceptance. This practice, while ensuring compliance with the REF Open Access (OA) Policy and visibility of the research via the repository, does not fully account for articles being published gold OA.6 The evolving policy landscape, including consultations for the REF 2029 Open Access Policy7 and the increase in transformative agreements (also known as read and publish deals), additionally necessitates a change in practice and messaging.
Oxford aims to streamline processes to reduce administrative load while maintaining compliance and maximizing research dissemination. To address these challenges, ORA has implemented automated content and metadata collection systems by integrating with Jisc Publications Router8 and CORE.9 This article explores these automated connections, their success and challenges, and outlines future targets to enhance content acquisition from additional sources, ensuring efficient and comprehensive research output management at Oxford.
Current state of open access at Oxford
‘Many academics at Oxford are now meeting OA compliance through gold OA rather than through green OA.’
Gold versus green open access
A substantial portion of OA compliance at Oxford is currently achieved through gold OA, where publications are made openly accessible immediately by the publisher, often involving article processing charges (APCs).10 In contrast, green OA11 involving the self-archiving of accepted manuscripts in ORA has been on the decrease. Recent internal reports indicate that the percentage of articles currently considered in-scope and meeting Oxford’s Act on Acceptance policy is low, at 31%.12
The preference for gold OA at Oxford is influenced by funder policies from organizations such as the Wellcome Trust13 and UK Research and Innovation (UKRI),14 which provide Oxford with block grants of funding to support gold OA. Additionally, the proliferation of read and publish deals has expanded OA publishing options, reducing the need for individual APCs but increasing overall coverage of gold OA publishing with Oxford researchers (i.e. those not funded and supported by block grants for individual APC payments).
Despite the continued push for depositing to ORA with Act on Acceptance, whether publishing gold OA or self-archiving in the repository, the deposit rate of publications to ORA remains lower than desired, creating gaps in compliance tracking and eligibility for REF submissions. Many researchers consider the administrative burden associated with depositing an article already published gold OA to be unnecessary, and reporting OA outputs highlights this issue, necessitating efficient and automated solutions in capturing the relevant metadata and full text.
Automated solutions for OA compliance
Jisc Publications Router
‘… it is estimated that the majority of Oxford’s research output, published gold OA, is covered by the service.’
To address the gaps in metadata and full-text collection, Oxford has integrated the Jisc Publications Router with ORA. The connection between the Jisc Publications Router and ORA has been developed in partnership with Cottage Labs.15 Jisc Publications Router provides automated notifications of research articles determined to have an affiliation with Oxford and allows ORA to harvest metadata and full-text files for this content.
Jisc Publications Router currently sources information from 25 different content providers,16 including publishers, Crossref,17 PubMed18 and Europe PubMed Central.19 While the content provided is almost exclusively for gold OA publications, from evaluating the coverage of publishers in the providers list alongside where Oxford researchers are seen to be publishing, early indications show that a high percentage of Oxford’s open access research output is covered by the service.
Data from within Elements for 2021 and 2022 estimate that Oxford articles with publishers in Jisc Publications Router accounted for 73% of all articles (where a publisher was recorded in Elements), and of the ‘top 20’ publishers with which Oxford researchers are seen to publish, 13 are currently providing content to Jisc Publications Router.
This opens the potential for significant automatic harvest of open access content to ORA without the need to create a manual deposit – though it does rely on the content being made available by the content providers.
Integration with ORA and Symplectic Elements
Setting up a connection with Jisc Publications Router was not simply a case of ‘plug and play’. The ORA system has been developed using Fedora20 (an open source repository system for managing digital content provided by the Lyrasis member organization) and Hyrax21 (a user interface software as part of the Samvera, open source repository solution) technologies. A connection between these software and Jisc Publications Router had not been previously established and needed to be developed, exploiting the Jisc Publications Router application programming interface (API) and SWORD22 protocol endpoints. This type of connection is familiar to ORA due to the connection with Elements via Repository Tools 2 (RT2).
It was necessary to connect to ORA over Elements, as there is no direct feed from Jisc Publications Router to Elements. However, due to the existing RT2 connection between ORA and Elements, records created or updated with new content in ORA are available to harvest by Elements, allowing for seamless integration. This automated workflow ensures that publications appear in Elements for users to claim alongside publications records found from other sources. Where a researcher within Elements has enabled automatic claiming via an identifier, such as ORCID23 or email address, and this identifier is captured in the metadata received for an object from Jisc Publications Router, objects can be automatically claimed to a researcher’s account without the need for deposit or other action, streamlining the process of compliance and reporting.
Enhancing metadata and full-text collection
As part of the development work to connect ORA with Jisc Publications Router, significant time needed to be spent massaging the data provided by Jisc Publications Router into the ORA Data Model24 – consolidating the many values that Jisc Publications Router provides for content to make the values meaningful for presentation in the repository. For example, based on an analysis of 60,500 notifications, it was found that Jisc Publications Router provides approximately 230 different values for ‘item type’ that required mapping to the ORA item type and sub-type values.
Some assumptions also needed to be applied in order to account for information not included in the data provided via Jisc Publications Router, or where the data was unclear, for example where multiple files are received for a single object, binary file metadata does not provide information on whether the file is the version of record or a supplementary file. Inclusion of certain words within the file name itself has then been used to make assumptions and add metadata values for file versions, using the RIOXX25 metadata profile, based on their inclusion (e.g. ‘VoR’, ‘Supp’, ‘Article’).
As part of the ORA ingest process for Jisc Publication Router objects, the system was updated to check for existing records to avoid duplication and to determine the OA status and embargo26 restrictions of the content already held in ORA or currently being ingested.
In addition to creating new objects within ORA and avoiding duplication creation, extending the capabilities of the Jisc Publications Router connection to update existing ORA records with publication dates and final versions of articles was a key focus – thus enhancing the completeness and accuracy of existing ORA records and removing the need for manual updates by ORA Review staff.
On ingest of an object sourced via Jisc Publications Router, if an object already exists in ORA it is assessed to see whether the ORA record is already ‘complete’, whether an update can be taken to the record (e.g. a new file version can be made available, metadata can be updated, etc.) or whether the object needs to be checked by an ORA reviewer. If the new object or update to an existing object meets a defined set of criteria then once ingested, either as a new object created or record updated, items are published to the ORA public website automatically.
The story so far
Overall, the metadata being provided via Jisc Publications Router is rich in detail, allowing for complete records for research publications in ORA to be created. ORA went live with the developed connection to Jisc Publications Router in the middle of June 2024 and, at the time of writing, since ‘go live’ with the integration, 2,710 publications have been added to ORA using harvested content. Of these objects, 1,852 have been automatically ingested and automatically published to the ORA public website without staff having to engage with the object (an example record can be found in reference27). The remaining items have been ingested, but not yet ‘published’ – pending repository staff checks.
As confidence grows with the integration, and records are able to be supplemented further with metadata, the intention is that more records will pass the automatic publication threshold.
CORE and ORA
‘Using an aggregator service such as CORE to update information within a repository highlights the importance of the availability of high-quality metadata and underscores the need for continuous improvement in metadata sharing and harvesting practices across the research community…’
The integration of content from CORE to ORA further provides an opportunity for the repository to gather information on research articles automatically and to streamline staff workflows. CORE, a not-for-profit service, aggregates research articles from numerous repositories and journals worldwide, providing a vast collection of content. Using the CORE API,28 ORA is able to benefit from the information on Oxford research collected by CORE to enhance its workflow surrounding reviewing and curating research articles.
Information about this work can be read in the article found in reference29 and is discussed in the context of this article in the following paragraphs.
CORE API integration – enhancing review processes
The CORE API serves as a pivotal tool in this integration, offering open-source access to millions of documents from various aggregated data sources such as Crossref, DataCite,30 BASE,31 PubMed Central, and arXiv.32 The API facilitates efficient data retrieval and provides a unified view of multiple information sources. Institutions like ORA can utilize this API to access a wide array of metadata fields, such as title, authors, publication date, Digital Object Identifier (DOI)33 and download URL. ORA uses a RAKE task,34 a mechanism within the Ruby coding language, to identify records for automatic update from the CORE API, focusing on objects in specific review states. These updates include adding publication information and metadata to existing records, which are then flagged for ORA review staff for processing to ensure accuracy and completeness.
The integration of the CORE API into ORA’s workflows addresses the evolving policies impacting institutional repositories, such as the Plan S initiative and funder mandates from organizations like the Wellcome Trust and UKRI. These policies necessitate repositories to be aware of publication updates to ensure compliance with OA requirements.
Oxford ran a pilot from February 2023 to allow staff to ‘opt-in’ their publications to be supported by a rights retention policy. From 14 October 2024, rights retention was incorporated within the University of Oxford Open Access Publications Policy35 and requires authors to ‘opt-out’ should they wish the policy not to affect a specific publication. Oxford’s adoption of a rights retention policy further supports self-archiving and OA, emphasizing the need for accurate and timely release of research outputs on publication.
By automating the update of metadata fields and flagging records for earlier review, the CORE API reduces the manual effort required to track publication updates or for staff to manually search for an update to a record awaiting supplementary information or release from embargo – streamlining ORA’s review processes and ensuring compliance with funder requirements.
Addressing metadata quality and API limitations
CORE aggregates data from various providers at different points in the publication life cycle (e.g. at submission, acceptance, publication). A critical aspect of the integration is therefore ensuring the accuracy of harvested metadata and that the information is as ‘complete’ as possible. Some providers may supply default or incorrect date information, necessitating careful validation by ORA staff. For example, dates defaulting to the first day of a given month or year are common issues. ORA has established criteria to ensure the most accurate updates to the publication date field, prioritizing more granular date information. Despite these challenges, the integration of the CORE API supports ORA’s compliance with funder requirements and the rights retention policy, ensuring the timely and accurate release of research outputs.
In setting up the query to the API there are also technical and content limitations. In terms of technical limitations this relates to the HTTP headers used for the API and the number of search tokens allowed (150 per five minutes). This was primarily overcome by implementing a ‘sleep’ between API calls. For bibliographic content, other limitations are based on the number of data providers, but also how frequently these are updated or harvested. In testing it was found that some DOIs within ORA that were being used to match within the CORE API did not yet exist in CORE.
CORE notes that while they try their ‘… best to have full coverage of DOIs by keeping synchronized with Crossref and exposing and comparing DOIs from the repositories, however, we still don’t have full coverage…’. None of the limitations or challenges were significant or insurmountable in continuing to make the API connection to CORE.
In practice, a two-step process, by which the API call identifies potential new content, and an ORA staff member then checks to make sure the proposed addition has been correctly identified and has sufficient quality metadata, provides the best balance of human and machine input – reducing the manual labour previously required in identifying updates to deposited outputs.
Future directions and enhancements with CORE
Looking forward, ORA is exploring further developments to leverage CORE’s capabilities fully. One potential enhancement is the automatic identification and acquisition of full-text content from repositories and other metadata providers to CORE, i.e. automatically collecting full text from source or providing a link for review staff to collect a version that can be made available via ORA (such as an author accepted manuscript (AAM). This would augment ORA’s repository content without additional effort from Oxford researchers, increasing the repository’s comprehensiveness and utility.
Using an aggregator service such as CORE to update information within a repository highlights the importance of the availability of high-quality metadata and underscores the need for continuous improvement in metadata sharing and harvesting practices across the research community, ultimately supporting a more efficient and accessible OA infrastructure. The University of Oxford, as a Supporting Member36 of CORE, makes its repository content available to CORE encapsulating RIOXX metadata standards, facilitating broader access to Oxford’s research outputs and advancing the OA mission.
Expanding automated harvesting
Feedback from Oxford Divisions and governance groups has highlighted the need for further automation – working towards an 80/20 split of automation of processes over manual effort in depositing to ORA and meeting reporting requirements.
While developments with Jisc Publications Router and CORE are beginning to provide an avenue for automation they are not without their limitations – for example, many of the publishers providing content to Jisc Publications Router do not explicitly include the open access content that forms part of a read and publish deal, instead providing pure gold open access only or gold OA within hybrid journals.
As a result, further development is planned to expand automated harvesting capabilities, including integrating additional metadata sources and exploring mechanisms to obtain full-text content from repositories and aggregators. A scoping exercise is being undertaken to explore the feasibility of integrating additional sources to the repository such as OpenAlex,37 which includes information from sources such as Internet Archive Scholar38 and Unpaywall,39 and to further the use of identifiers for the purpose of automated claiming within Elements, such as ORCID, email address and Scopus Author ID. This would further enhance the completeness and accuracy of ORA records, ensuring comprehensive coverage of Oxford’s research outputs and additionally reducing the need for manual deposits.
Comprehensive communications review and conclusion
‘Automated deposit mechanisms are crucial to this effort, as is the development of policy around identifier integration (such as mandating the use of ORCIDs) …’
The current OA policy at Oxford, centred around Act on Acceptance, does not fully align with recent funder requirements for immediate OA. This misalignment, coupled with declining researcher engagement, necessitates a review and update of Oxford’s OA policy. The uncertainty in the use of funder block grants post-2024 to support transitional agreements40 further complicates the funding landscape for OA, emphasising the need for a sustainable approach that balances green and gold OA routes.
A thorough review of Oxford’s OA policy and messaging is planned, expected to be led by an external agency commissioned to undertake an assessment. This review will incorporate feedback from the collegiate University, the academic Divisions and Oxford’s governing bodies involved with OA. The goal is to ensure that the updated policy reflects the realities of OA publishing while supporting sustainable practices and compliance.
Feedback indicates that reducing the administrative burden associated with OA compliance will improve researcher receptivity to updated policies. Automated deposit mechanisms are crucial to this effort, as is the development of policy around identifier integration (such as mandating the use of ORCIDs), which could facilitate automatic claiming in Elements and further streamline processes.
The integration of automated systems like the Jisc Publications Router and CORE API represents a significant advancement in OA compliance at Oxford. These efforts, combined with a comprehensive policy review and service development, will support sustainable OA practices, enhance the visibility and impact of Oxford’s research and ensure compliance with evolving funder and institutional mandates.
Abbreviations and Acronyms
A list of the abbreviations and acronyms used in this and other Insights articles can be accessed here – click on the URL below and then select the ‘full list of industry A&As’ link: http://www.uksg.org/publications#aa.
Competing interests
The author has declared no competing interests.
