Abstract
The English Catalogue of Books (ECB) is a yearly record of books issued in England and Ireland compiled by the trade publication Publishers’ Circular, which ran from the mid-19th through the mid-20th centuries. During much of this period, London was home to the largest English-language publishing industry in the world–rivalled only by New York. The ECB is thus a key resource for researchers interested in modern British and Irish print culture and Anglophone publishing more broadly. Until recently, though, the ECB was available only in printed copies or digital facsimiles, making it difficult to draw broad conclusions from the information in the catalogues. We discuss unlocking the ECB for computational analysis by 1) extracting and parsing bibliographic information for a decade of issues, and 2) establishing an approach that can be customized and extended to additional years of the catalogue.
