
A Corpus of Portuguese Historical and Metalinguistic Grammars (1536–1864)
Abstract
This dataset presents the Corpus of Portuguese Historical and Metalinguistic Grammars (1536–1864), a machine-readable collection of six works documenting the standardization of the Portuguese language during the 16th to 19th centuries. The data was transcribed from original facsimiles using Transkribus and a custom-trained Automatic Text Recognition (ATR) model, Early Portuguese Printing (2.58% Character Error Rate, CER), supplemented by strict manual philological curation. This corpus is structured as a relational SQLite database, accompanied by raw text exports and PDFs with embedded text layers. By capturing three centuries of intense typographic and linguistic variation, this dataset provides an essential foundation for tracking the evolution of the prescriptive norms of the Portuguese language, as well as long-term linguistic change and sociopolitical/ideological shifts across Portugal and Brazil.
© 2026 Saulo Rogério Pacheco Rocha, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.