Have a personal or library account? Click to login
WikiTextGraph: A Python Tool for Parsing Multilingual Wikipedia Text and Graph Extraction Cover

WikiTextGraph: A Python Tool for Parsing Multilingual Wikipedia Text and Graph Extraction

Open Access
|Sep 2025

Abstract

WikiTextGraph is an open-source Python package designed to extract and process text from Wikipedia dumps and construct internal link networks across multiple language editions. It uses efficient parsing, redirect resolution, and multilingual graph-building techniques to tackle the challenges of Wikipedia’s scale, structure, and inherent noise. With a modular architecture and a simple graphical user interface (GUI), it is suitable for both technical and non-technical users. Built for scalability and reproducibility, WikiTextGraph supports interdisciplinary research in network science, computational linguistics, and digital humanities. Its flexible design enables easy adaptation for tasks involving low-resource or cross-lingual language studies.1

DOI: https://doi.org/10.5334/jors.572 | Journal eISSN: 2049-9647
Language: English
Submitted on: Apr 15, 2025
Accepted on: Jul 28, 2025
Published on: Sep 12, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Paschalis Agapitos, Juan-Luis Suárez, Gustavo Ariel Schwartz, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.