Outside the Discipline, Inside the Data: A Retrospective Account of an Undocumented Tunisian Language Corpus in an Extractivist Research Context

Khaoula Stiti

doi:10.5334/johd.525

Outside the Discipline, Inside the Data: A Retrospective Account of an Undocumented Tunisian Language Corpus in an Extractivist Research Context

Journal of Open Humanities Data

Volume 12 (2026): Issue 1

By: Khaoula Stiti

Open Access

|May 2026

Abstract

This discussion paper is a retrospective reflection on a qualitative interview corpus produced during my doctoral fieldwork in Tunis, Tunisia, in June 2022. As a researcher trained in architecture rather than linguistics or data science, I did not set out to build a language dataset, yet that is precisely what I produced. Working with 18 master’s students from two Tunisian universities, I coordinated the collection, transcription, and translation of 152 structured interviews with residents of the buffer zone of the UNESCO World Heritage Site of the Medina of Tunis. The interviews were conducted and transcribed in Tunisian and translated into French, within a single five-day fieldwork week. The resulting corpus, transcriptions, French translations, and a small number of audio recordings and photos, subsequently underpinned a published research paper, but was not formally documented, deposited, or cited as a dataset. This paper is also an account of remediation: the corpus has since been deposited on Zenodo, with public metadata and restricted access files, as a first step toward making it reusable. Looking back at this work now, I ask: what went wrong, why, and what should researchers outside the field of linguistics know before they find themselves in the same position? This paper is addressed above all to researchers in the humanities and built-environment disciplines who collect language data as part of their work without recognising it as such, and who risk, as I did, allowing irreplaceable fieldwork material to disappear into personal storage. The paper also situates this experience within a broader reflection on extractive research practices in asymmetric, cross-cultural settings.

References

Chagnon, C. W., Durante, F., Gills, B. K., Hagolani-Albov, S. E., Hokkanen, S., Kangasluoma, S. M. J., … Vuola, M. P. S. (2022). From extractivism to global extractivism: the evolution of an organizing concept. The Journal of Peasant Studies, 49(4), 760–792. 10.1080/03066150.2022.2069015
Open DOI Search in Google Scholar Back to article
Godrie, B. (2025). Resisting scientific extractivism: A post-extractivist policy of knowledge production with marginalised communities. Gateways: International Journal of Community Research and Engagement, 18(1), 1–14. 10.3316/informit.T2025021700006901893852020
Open DOI Search in Google Scholar Back to article
Koesten, L., Vougiouklis, P., Simperl, E., & Groth, P. (2020). Dataset reuse: Toward translating principles to practice. Patterns, 1(8), Article 100136. 10.1016/j.patter.2020.100136
Open DOI Search in Google Scholar Back to article
McNeil, K. (2022). ‘We don’t speak the same language:’ language choice and identity on a Tunisian internet forum. International Journal of the Sociology of Language, 2022(278), 51–80. 10.1515/ijsl-2021-0126
Open DOI Search in Google Scholar Back to article
Sghaier, M. A., & Zrigui, M. (2017, October). Tunisian dialect-modern standard Arabic bilingual lexicon. In 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA) (pp. 973–979). IEEE. 10.1109/AICCSA.2017.125
Open DOI Search in Google Scholar Back to article
Stiti, K. (2023). To Appreciate (Or Not) the Colonial Downtowns in Tunisia: An Essay from Tunis, About Tunis. In Material Practices: Positionality, Methodology, and Ethics (1 ed., p. 49). Munich: M. Schalk, K. Reisinger, E. Markus, U. Leconte. Print-ISBN: 978-3-948278-41-0.
Search in Google Scholar Back to article
Stiti, K., Achour, S., & Ben Rajeb, S. (2025). Questioning Heritage Values: Lessons Learned from an Inventory. Conservation of Architectural and Urban Heritage: Indigenous and Global Sustainable Practices, 67. 10.1007/978-3-031-71145-9_5
Open DOI Search in Google Scholar Back to article
Stiti, K., & Ben Rajeb, S. (2023, October). Participatory Heritage Platforms for Raising Awareness of Endangered Heritage. Usability and acceptability study in the historical urban landscape of Tunis. H2PTM’23: La fabrique du sens à l’ère de l’information numérique: enjeux et défis (pp. 218–234). ISTE Editions.
Search in Google Scholar Back to article
Stiti, K., & Ben Rajeb, S. (2024). Assessment of Non-Researcher-Generated Data Within the Framework of Heritage Attributes Insights and Challenges. Cooperative Design, Visualization, and Engineering. 10.1007/978-3-031-71315-6_9
Open DOI Search in Google Scholar Back to article
Stiti, K., Jeunejean, A., & Ben Rajeb, S. (2022, September). Participatory collection and dissemination of architectural and urban heritage information: P@trimonia platform. In International Conference on Cooperative Design, Visualization and Engineering (pp. 265–277). Cham: Springer International Publishing. 10.1007/978-3-031-16538-2_27
Open DOI Search in Google Scholar Back to article
Zribi, I., Boujelbane, R., Masmoudi, A., Khemekhem, M. E., Belguith, L. H., & Habash, N. (2014, May). A Conventional Orthography for Tunisian Arabic. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (pp. 2355–2361). 10.63317/2jhid49as43h
Open DOI Search in Google Scholar Back to article