Have a personal or library account? Click to login
Neural Language Models for Nineteenth-Century English Cover

Neural Language Models for Nineteenth-Century English

Open Access
|Sep 2021

Abstract

We present four types of neural language models trained on a large historical dataset of books in English, published between 1760 and 1900, and comprised of ≈5.1 billion tokens. The language model architectures include word type embeddings (word2vec and fastText) and contextualized models (BERT and Flair). For each architecture, we trained a model instance using the whole dataset. Additionally, we trained separate instances on text published before 1850 for the type embeddings, and four instances considering different time slices for BERT. Our models have already been used in various downstream tasks where they consistently improved performance. In this paper, we describe how the models have been created and outline their reuse potential.

DOI: https://doi.org/10.5334/johd.48 | Journal eISSN: 2059-481X
Language: English
Published on: Sep 27, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Kasra Hosseini, Kaspar Beelen, Giovanni Colavizza, Mariona Coll Ardanuy, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.