Supporting the corpus-based study of Shakespeare’s language: Enhancing a corpus of the First Folio

Jonathan Culpeper; Andrew Hardie; Jane Demmen; Jennifer Hughes; Matt Timperley

doi:10.2478/icame-2021-0002

.blurhash-client-img { display: none !important; }

Supporting the corpus-based study of Shakespeare’s language: Enhancing a corpus of the First Folio

ICAME Journal

Volume 45 (2021): Issue 1 (May 2021)

By: Jonathan Culpeper, Andrew Hardie, Jane Demmen, Jennifer Hughes and Matt Timperley

Open Access

|Jun 2021

Abstract

This article explores challenges in the corpus linguistic analysis of Shakespeare’s language, and Early Modern English more generally, with particular focus on elaborating possible solutions and the benefits they bring. An account of work that took place within the Encyclopedia of Shakespeare’s Language Project (2016–2019) is given, which discusses the development of the project’s data resources, specifically, the Enhanced Shakespearean Corpus. Topics covered include the composition of the corpus and its subcomponents; the structure of the XML markup; the design of the extensive character metadata; and the word-level corpus annotation, including spelling regularisation, part-of-speech tagging, lemmatisation and semantic tagging. The challenges that arise from each of these undertakings are not exclusive to a corpus-based treatment of Shakespeare’s plays but it is in the context of Shakespeare’s language that they are so severe as to seem almost insurmountable. The solutions developed for the Enhanced Shakespearean Corpus – often combining automated manipulation with manual interventions, and always principled – offer a way through.

References

Abbott, Edwin, A. 1870. A Shakespearian grammar. Third edition. London: Macmillan.
Search in Google Scholar Back to article
Alexander, Marc, Fraser Dallachy, Scott Piao, Alistair Baron and Paul Rayson. 2015. Metaphor, popular science and semantic tagging: Distant reading with the Historical Thesaurus of English. Digital Scholarship in the Humanities 30(suppl_1): i16–i27. https://doi.org/10.1093/llc/fqv04510.1093/llc/fqv045
Search in Google Scholar Back to article
Archer, Dawn and Jonathan Culpeper. 2003. Sociopragmatic annotation: New directions and possibilities in historical corpus linguistics. In A. Wilson, P. Rayson and A.M. McEnery (eds.). Corpus linguistics by the lune: A festschrift for Geoffrey Leech, 37–58. Frankfurt/Main: Peter Lang.
Search in Google Scholar Back to article
Archer, Dawn, Merja Kytö, Alistair Baron and Paul Rayson. 2015. Guidelines for normalising early modern English corpora: Decisions and justifications. ICAME Journal 39: 5–24. https://doi.org/10.1515/icame-2015-000110.1515/icame-2015-0001
Search in Google Scholar Back to article
Baron, Alistair and Paul Rayson. 2008. VARD 2: A tool for dealing with the spelling variation in historical corpora. In Proceedings of the Postgraduate Conference in Corpus Linguistics, Aston University, Birmingham, U.K., 22 May 2008.
Search in Google Scholar Back to article
Blake, Norman. 2002. A grammar of Shakespeare’s language. Basingstoke: Pal-grave.10.1007/978-1-4039-1915-1
Search in Google Scholar Back to article
Bray, Tim, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler and François Yergeau (eds.). 2008. Extensible Markup Language (XML) 1.0. Fifth edition. W3C Recommendation 26 November 2008. https://www.w3.org/XML/
Search in Google Scholar Back to article
Busse, Ulrich. 2002. Linguistic variation in the Shakespeare corpus: Morpho-syntactic variability of second person pronouns. Amsterdam: John Benjamins.10.1075/pbns.106
Search in Google Scholar Back to article
Corfield, Penelope J. 1995. Power and the professions in Britain, 1700–1850. London: Routledge.10.4324/9780203171226
Search in Google Scholar Back to article
Craig, Hugh and Brett Greatley-Hirsch. 2017. Style, computers, and Early Modern drama: Beyond authorship. Cambridge: Cambridge University Press.10.1017/9781108120456
Search in Google Scholar Back to article
Crystal, David. 2016. The Oxford dictionary of original Shakespearean pronunciation. Oxford: Oxford University Press.10.1093/acref/9780199668427.001.0001
Search in Google Scholar Back to article
Crystal, David and Ben Crystal. 2002. Shakespeare’s words: A glossary and language companion. London: Penguin.
Search in Google Scholar Back to article
Demmen, Jane. 2020. Issues and challenges in compiling a corpus of early modern English plays for comparison with those of William Shakespeare. ICAME Journal 44: 37–68. https://doi.org/10.2478/icame-2020-000210.2478/icame-2020-0002
Search in Google Scholar Back to article
Duncan-Jones, Katherine and H.R. Woudhuysen (eds.). 2007. The narrative and other poems (The Arden Shakespeare) [also titled: Shakespeare’s poems: Venus and Adonis, The Rape of Lucrece and the shorter poems and Shakespeare’s poems]. London: Bloomsbury.
Search in Google Scholar Back to article
Farmer, Alan B. and Zachary Lesser (eds.). 2007. DEEP: Database of Early English Playbooks. Available online at http://deep.sas.upenn.edu. Last accessed 24 February 2021.
Search in Google Scholar Back to article
Garside, Roger and Nicholas Smith. 1997. A hybrid grammatical tagger: CLAWS4. In R. Garside, G. Leech and T. McEnery (eds.). Corpus annotation, 102–121. London: Routledge.10.4324/9781315841366
Search in Google Scholar Back to article
Garside, Roger, Geoffrey Leech and Geoffrey Sampson (eds.). 1987. The computational analysis of English. London: Longman.
Search in Google Scholar Back to article
Gledhill, Christopher, J. 2000. Collocations in science writing. Tübingen: Narr.
Search in Google Scholar Back to article
Hardie, Andrew. 2014. Modest XML for corpora: Not a standard, but a suggestion. ICAME Journal 38: 73–103. https://doi.org/10.2478/icame-2014-000410.2478/icame-2014-0004
Search in Google Scholar Back to article
Hinman, Charlton (ed.). 1968. The Norton facsimile: The First Folio of Shakespeare. New York: Norton.
Search in Google Scholar Back to article
Holmes, Geoffrey S. 1982. Augustan England: Professions, state and society, 1680–1730. London: George Allen and Unwin.
Search in Google Scholar Back to article
Hope, Jonathan. 2003. Shakespeare’s grammar. London: Arden Shakespeare.
Search in Google Scholar Back to article
Hunt, Margaret R. 1996. The middling sort: Commerce, gender, and the family in England, 1680–1780. Berkeley: University of California Press.10.1525/9780520916944
Search in Google Scholar Back to article
Kelly, Erin (ed.). (n.d.). The Taming of a Shrew. Internet Shakespeare Editions. Available online at https://internetshakespeare.uvic.ca/Library/SLT/plays/the%20taming%20of%20the%20shrew/ashrew.html. Last accessed 11 March 2021.
Search in Google Scholar Back to article
Leech, Geoffrey, Roger Garside and Michael Bryant. 1994. CLAWS 4: The tagging of the British National Corpus. In Proceedings of the 15th International Conference on Computational Linguistics (COLING 94), Kyoto, Japan, 622–628. http://ucrel.lancs.ac.uk/papers/coling1994paper.pdf10.3115/991886.991996
Search in Google Scholar Back to article
McEnery, Tony and Andrew Hardie. 2012. Corpus linguistics: Method, theory and practice. Cambridge: Cambridge University Press.10.1017/CBO9780511981395
Search in Google Scholar Back to article
Murphy, Sean. 2019. Shakespeare and his contemporaries: Designing a genre classification scheme for Early English Books Online 1560–1640. ICAME Journal 43: 59–82. https://doi.org/10.2478/icame-2019-000310.2478/icame-2019-0003
Search in Google Scholar Back to article
Nevalainen, Terrtu and Helena Raumolin-Brunberg. 2003. Historical sociolinguistics. London: Longman.
Search in Google Scholar Back to article
Onions, Charles T. 1986 [1911]. A Shakespeare glossary. Second edition. (Enlarged and revised by Robert D. Eagleson). Oxford: Clarendon Press.
Search in Google Scholar Back to article
Piao, Scott, Fraser Dallachy, Alistair Baron, Jane Demmen, Steve Wattam, Philip Durkin, James McCracken, Paul Rayson and Marc Alexander. 2017. A time-sensitive historical thesaurus-based semantic tagger for deep semantic annotation. Computer Speech and Language 46: 113–135. https://doi.org/10.1016/j.csl.2017.04.01010.1016/j.csl.2017.04.010
Search in Google Scholar Back to article
Rayson, Paul, Dawn Archer, Alistair Baron, Jonathan Culpeper and Nick Smith. 2007. Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora. In Proceedings of Corpus Linguistics 2007, July 27–30, University of Birmingham, UK. http://ucrel.lancs.ac.uk/people/paul/publications/RaysonEtAl_CL2007.pdf
Search in Google Scholar Back to article
Rayson, Paul, Dawn Archer, Scott Piao and Tony McEnery. 2004. The UCREL semantic analysis system. In Proceedings of the workshop on Beyond Named Entity Recognition: Semantic labelling for NLP tasks, in association with 4th International Conference on Language Resources and Evaluation (LREC 2004), 7–12. Lisbon, Portugal. http://www.lancaster.ac.uk/staff/rayson/publications/usas_lrec04ws.pdf
Search in Google Scholar Back to article
Rives, Amélie. 1888. A brother to dragons and other old-time tales. New York: Harper & Brothers.
Search in Google Scholar Back to article
Sanchez-Stockhammer, Christina. 2018. English compounds and their spelling. Cambridge: Cambridge University Press.10.1017/9781108181877
Search in Google Scholar Back to article
Schafer, Liz. (n.d.). A Shrew and The Shrew. British Library. Available online at https://www.bl.uk/treasures/shakespeare/shrew.html. Last accessed 11 March 2021.
Search in Google Scholar Back to article
Sharpe, James A. 1987. Early Modern England: A social history, 1550–1750. London: Edward Arnold.
Search in Google Scholar Back to article
Smith, Thomas. 1583. DE REPVBLICA ANGLORVM. The maner of Gouernement or policie of the Realme of England. London: Henry Middleton. Available online at http://name.umdl.umich.edu/A12533.0001.001. Last accessed 1 April 2021.
Search in Google Scholar Back to article
Taylor, Gary, John Jowett, Terri Bourus and Gabriel Egan. 2016. The new Oxford Shakespeare: William Shakespeare, The complete works, Modern critical edition. Oxford: Oxford University Press.
Search in Google Scholar Back to article
Wilson, Andrew and Jenny Thomas. 1997. Semantic annotation. In R. Garside, G. Leech and T. McEnery (eds.). Corpus annotation, 53–65. London: Rout-ledge.
Search in Google Scholar Back to article
Wrightson, Keith. 1982. English society, 1580–1680. London: Hutchinson.
Search in Google Scholar Back to article
Wrightson, Keith. 1991. Estates, degrees, and sorts: Changing perceptions of society in Tudor and Stuart England. In P.J. Corfield (ed.). Language, history and class, 30–52. Oxford: Blackwell.
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/icame-2021-0002 | Journal eISSN: 1502-5462 | Journal ISSN: 0801-5775

Journal RSS Feed

Language: English

Page range: 37 - 86

Published on: Jun 12, 2021

Published by: The International Computer Archive of Modern and Medieval English

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Related subjects:

Linguistics and semiotics,

Applied linguistics,

Quantitative, computational, and corpus linguistics,

Theoretical frameworks and disciplines,

Linguistics, other,

Germanic languages,

English,

Social sciences,

Communication science,

Communication science, other

© 2021 Jonathan Culpeper, Andrew Hardie, Jane Demmen, Jennifer Hughes, Matt Timperley, published by The International Computer Archive of Modern and Medieval English
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Volume 45 (2021): Issue 1 (May 2021)