Have a personal or library account? Click to login
Modular Bibliographical Profiling of Historic Book Reviews Cover

Modular Bibliographical Profiling of Historic Book Reviews

Open Access
|Mar 2024

Abstract

This paper examines different methods of predicting bibliographical details (e.g. author, title, and publisher) of books under review in a corpus of approximately 1,100 historical book reviews. The dataset is comprised of book reviews from ProQuest’s American Periodicals Series (APS). This kind of bibliographical profiling is often characterized as a Natural Language Processing (NLP) or Named Entity Recognition (NER) task, but it can be more specifically described as a two-part Named Entity Linking (NEL) task, beginning with a feature extraction stage followed by one of several available matching or classification methods. An attempt has been made to formalize constraints for modular bibliographical profiling (MBP) and shed light on some important choices that are often glossed over or obscured by digital humanities practitioners. Applying these constraints, the paper evaluates combinations of feature selection (naive bag-of-words [BOW], rule-based feature extraction, and NER using a pre-trained model) with a standardized similarity-based matching strategy (cosine similarity). All tasks are performed on derived text data (term frequency tables), so that data can be shared and all methods can be used on materials available only in non-consumptive formats. These comparisons suggest that naive BOW can perform quite robustly, and that using even a basic pre-trained NER model in conjunction with a BOW approach may reduce false positives.

DOI: https://doi.org/10.5334/johd.183 | Journal eISSN: 2059-481X
Language: English
Submitted on: Nov 13, 2023
Accepted on: Feb 8, 2024
Published on: Mar 18, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Matthew J. Lavin, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.