Skip to main content

Making the Complete OpenAIRE Citation Graph Easily Accessible Through Compact Data Representation Cover

Making the Complete OpenAIRE Citation Graph Easily Accessible Through Compact Data Representation

Journal of Open Humanities Data

Volume 12 (2026): Issue 1

By: Joakim Skarding and Pavel Sanda

Open Access

|Apr 2026

Abstract

The OpenAIRE graph contains a large citation graph dataset, with over 200 million publications and over two billion citations. The current graph is available as a dump with metadata which, when uncompressed, totals ∼2.5 TB. This makes it hard to process on conventional computers. To make this network more accessible for the community, we provide a processed OpenAIRE graph which is downscaled to 16 GB RAM, while preserving the full graph structure. Apart from this we offer the processed data in a very simple format, which allows for further straightforward manipulation. We also provide (1) a Python pipeline, which can be used to process the next releases of the OpenAIRE graph, and (2) a larger version of the dataset including more publication fields such as the title and list of authors.

References

Bloch, F., Jackson, M. O., & Tebaldi, P. (2023). Centrality measures in networks. Social Choice and Welfare, 61(2), 413–453. 10.1007/s00355-023-01456-4
Open DOI Search in Google Scholar Back to article
Caetano Machado Lopes, L., & Chacko, G. (2024). A Citation Graph from OpenAlex (Works). University of Illinois Urbana-Champaign. 10.13012/B2IDB-7362697_V1
Open DOI Search in Google Scholar Back to article
Carradore, M. (2022). Academic research output on social capital: a bibliometric and visualization analysis. International Journal of Sociology and Social Policy, 42(13/14), 113–134. 10.1108/IJSSP-11-2022-0281
Open DOI Search in Google Scholar Back to article
Ciuciu-Kiss, J. T., & Garijo, D. (2024a). Assessing the overlap of science knowledge graphs: A quantitative analysis. In International workshop on natural scientific language processing and research knowledge graphs (pp. 171–185). 10.1007/978-3-031-65794-8_11
Open DOI Search in Google Scholar Back to article
Ciuciu-Kiss, J. T., & Garijo, D. (2024b). Assessing the overlap of science knowledge graphs: A quantitative analysis. In G. Rehm, S. Dietze, S. Schimmler, & F. Krüger (Eds.), Natural scientific language processing and research knowledge graphs (pp. 171–185). Springer Nature Switzerland. 10.1007/978-3-031-65794-8_11
Open DOI Search in Google Scholar Back to article
Costa, A. A., & Frigori, R. B. (2024). Complexity and phase transitions in citation networks: insights from artificial intelligence research. Frontiers in Research Metrics and Analytics, 9, 1456978. 10.3389/frma.2024.1456978
Open DOI Search in Google Scholar Back to article
Crothers, C., Bornmann, L., & Haunschild, R. (2020). Citation concept analysis (CCA) of Robert K. Merton’s book Social Theory and Social Structure: How often are certain concepts from the book cited in subsequent publications? Quantitative Science Studies, 1(2), 675–690. 10.1162/qss_a_00029
Open DOI Search in Google Scholar Back to article
Culbert, J. H., Hobert, A., Jahn, N., Haupka, N., Schmidt, M., Donner, P., & Mayr, P. (2025). Reference coverage analysis of OpenAlex compared to web of science and scopus. Scientometrics, 130(4), 2475–2492. 10.1007/s11192-025-05293-3
Open DOI Search in Google Scholar Back to article
Dederke, J., Koch, M., & Willemin, S. (2024). The representation of Swiss higher education institutions in five bibliometric databases. Qualität in der Wissenschaft, 2024(4), 117–124. 10.3929/ethz-b-000726102
Open DOI Search in Google Scholar Back to article
Drivas, K. (2024). The evolution of order of authorship based on researchers’ age. Scientometrics, 129(9), 5615–5633. 10.1007/s11192-024-05124-x
Open DOI Search in Google Scholar Back to article
Fortunato, S. (2010). Community detection in graphs. Physics reports, 486(3-5), 75–174. 10.1016/j.physrep.2009.11.002
Open DOI Search in Google Scholar Back to article
Frank, M. R., Wang, D., Cebrian, M., & Rahwan, I. (2019). The evolution of citation graphs in artificial intelligence research. Nature Machine Intelligence, 1(2), 79–85. 10.1038/s42256-019-0024-5
Open DOI Search in Google Scholar Back to article
González-Márquez, R., Schmidt, L., Schmidt, B. M., Berens, P., & Kobak, D. (2024). The landscape of biomedical research. Patterns, 5(6). 10.1016/j.patter.2024.100968
Open DOI Search in Google Scholar Back to article
Hendricks, G., Tkaczyk, D., Lin, J., & Feeney, P. (2020). Crossref: The sustainable source of community-owned scholarly metadata. Quantitative Science Studies, 1(1), 414–427. 10.1162/qss_a_00022
Open DOI Search in Google Scholar Back to article
Jaradeh, M. Y., Oelen, A., Farfar, K. E., Prinz, M., D’Souza, J., Kismihók, G., … Auer, S. (2019). Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. In Proceedings of the 10th international conference on knowledge capture (pp. 243–246). 10.1145/3360901.3364435
Open DOI Search in Google Scholar Back to article
Kipf, T. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. 10.48550/arXiv.1609.02907
Open DOI Search in Google Scholar Back to article
Kitajima, K., & Okamura, K. (2025). The altering landscape of us–china science collaboration: from convergence to divergence. Humanities and Social Sciences Communications, 12(1), 1–11. 10.1057/s41599-025-04550-3
Open DOI Search in Google Scholar Back to article
Leskovec, J., & Sosič, R. (2016). Snap: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST), 8(1), 1–20. 10.1145/2898361
Open DOI Search in Google Scholar Back to article
Manghi, P., Atzori, C., Bardi, A., Baglioni, M., Dimitropoulos, H., La Bruzzo, S., … Chatzopoulos, S. (2025, September). OpenAIRE graph dataset. OpenAIRE. 10.5281/zenodo.17098012
Open DOI Search in Google Scholar Back to article
Martín-Martín, A., Thelwall, M., Orduna-Malea, E., & Delgado López-Cózar, E. (2021). Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations. Scientometrics, 126(1), 871–906. 10.1007/s11192-020-03690-4
Open DOI Search in Google Scholar Back to article
Peroni, S., & Shotton, D. (2020). Opencitations, an infrastructure organization for open scholarship. Quantitative Science Studies, 1(1), 428–444. 10.1162/qss_a_00023
Open DOI Search in Google Scholar Back to article
Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint arXiv:2205.01833. 10.48550/arXiv.2205.01833
Open DOI Search in Google Scholar Back to article
Rettberg, N., & Schmidt, B. (2012). OpenAIRE-Building a collaborative open access infrastructure for european researchers. LIBER Quarterly: The Journal of the Association of European Research Libraries, 22(3), 160–175. 10.18352/lq.8110
Open DOI Search in Google Scholar Back to article
Skarding, J., Gabrys, B., & Musial, K. (2021). Foundations and modeling of dynamic networks using dynamic graph neural networks: A survey. IEEE Access, 9, 79143–79168. DOI: 10.1109/ACCESS.2021.3082932
Open DOI Search in Google Scholar Back to article
The pandas development team. (2026, January). pandas-dev/pandas: Pandas. Zenodo. 10.5281/zenodo.18328522
Open DOI Search in Google Scholar Back to article
Xiao, Z., Fan, L., Yu, Z., & Liu, X. (2025). Characterizing pandemic-related publications: a retrospective study using spatial citation network analysis. Computational Urban Science, 5(1), 25. 10.1007/s43762-025-00184-y
Open DOI Search in Google Scholar Back to article
Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., … others (2016). Apache spark: a unified engine for big data processing. Communications of the ACM, 59(11), 56–65. 10.1145/2934664
Open DOI Search in Google Scholar Back to article

Figures & tables

Additional files

Articles in this issue

DOI: https://doi.org/10.5334/johd.520 | Journal eISSN: 2059-481X

Journal RSS Feed

Language: English

Page range: 63 - 63

Submitted on: Feb 13, 2026

|

Accepted on: Apr 10, 2026

|

Published on: Apr 30, 2026

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

citation network,

dynamic network,

large scale network

© 2026 Joakim Skarding, Pavel Sanda, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 12 (2026): Issue 1

Previous article