Skip to main content
Have a personal or library account? Click to login
A Corpus of Portuguese Historical and Metalinguistic Grammars (1536–1864) Cover

A Corpus of Portuguese Historical and Metalinguistic Grammars (1536–1864)

Open Access
|May 2026

Abstract

This dataset presents the Corpus of Portuguese Historical and Metalinguistic Grammars (1536–1864), a machine-readable collection of six works documenting the standardization of the Portuguese language during the 16th to 19th centuries. The data was transcribed from original facsimiles using Transkribus and a custom-trained Automatic Text Recognition (ATR) model, Early Portuguese Printing (2.58% Character Error Rate, CER), supplemented by strict manual philological curation. This corpus is structured as a relational SQLite database, accompanied by raw text exports and PDFs with embedded text layers. By capturing three centuries of intense typographic and linguistic variation, this dataset provides an essential foundation for tracking the evolution of the prescriptive norms of the Portuguese language, as well as long-term linguistic change and sociopolitical/ideological shifts across Portugal and Brazil.

DOI: https://doi.org/10.5334/johd.542 | Journal eISSN: 2059-481X
Language: English
Page range: 65 - 65
Submitted on: Mar 26, 2026
Accepted on: May 7, 2026
Published on: May 20, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Saulo Rogério Pacheco Rocha, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.