
Making the Complete OpenAIRE Citation Graph Easily Accessible Through Compact Data Representation
Abstract
The OpenAIRE graph contains a large citation graph dataset, with over 200 million publications and over two billion citations. The current graph is available as a dump with metadata which, when uncompressed, totals ∼2.5 TB. This makes it hard to process on conventional computers. To make this network more accessible for the community, we provide a processed OpenAIRE graph which is downscaled to 16 GB RAM, while preserving the full graph structure. Apart from this we offer the processed data in a very simple format, which allows for further straightforward manipulation. We also provide (1) a Python pipeline, which can be used to process the next releases of the OpenAIRE graph, and (2) a larger version of the dataset including more publication fields such as the title and list of authors.
© 2026 Joakim Skarding, Pavel Sanda, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.