Enriching the 1758 Portuguese Parish Memories (Alentejo) with Named Entities

Renata Vieira; Fernanda Olival; Helena Freire Cameron; Joaquim Santos; Ofélia Sequeira; Ivo Santos

doi:10.5334/johd.43

References

1Akbik, A., Blythe, D., & Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. Proceedings of the 27th International Conference on Computational Linguistics, Santa Fé, USA.
Back to article
2Baron, A., & Rayson, P. (2008). VARD2: A tool for dealing with spelling variation in historical corpora. Postgraduate Conference in Corpus Linguistics, Birmingham, U.K.
Back to article
3Birch, C., Oom S., & Beecham, J. (2007). Rectangular and hexagonal grids used for observation, experiment and simulation in ecology. Ecological Modelling, 206(3–4), 347–359. DOI: 10.1016/j.ecolmodel.2007.03.041
Back to article
4Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146. DOI: 10.1162/tacl_a_00051
Back to article
5Cameron, H. F., Gonçalves, M. F., & Quaresma, P. (2020). Linguistic and orthographical classic Portuguese variants challenges for NLP. Proceedings of the 14th International Conference on the Computational Processing of Portuguese, Évora, Portugal.
Back to article
6Chiu, J. P., & Nichols, E. (2016). Named entity recognition with bidirectional LSTM-CNNS. Transactions of the Association for Computational Linguistics, 4, 357–370. DOI: 10.1162/tacl_a_00104
Back to article
7Chorão, M. J. M. B. (1987). Inquéritos promovidos pela Coroa no século XVIII. Revista de História Económica e Social, 1a série, 21, 93–130.
Back to article
8Christaller, W., & Baskin, C. W. (1966). Central places in southern Germany. Engelwood Cliffs, NJ: Prentice-Hall.
Back to article
9Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537. Aug.
Back to article
10Consoli, B. S., Santos, J., Gomes, D., Cordeiro, F., Vieira, R., & Moreira, V. (2020). Embeddings for named entity recognition in geoscience Portuguese literature. Proceedings of the 12th Language Resources and Evaluation Conference, Marseílle, France.
Back to article
11Costa, A. C. (1706–1712). Corografia Portugueza e descripçam topografica do famoso reyno de Portugal: com as noticias das fundações das cidades, villas, & lugares, que contem, varões illustres, genealogias das familias nobres, fundações de conventos, catalogos dos bispos, antiguidades, maravilhas de natureza, edificios & outras curiosas observaçoens, Tomo primeyro[-terceyro], vol. 1-2-3. Lisboa: na officina de Valentim da Costa Deslandes.
Back to article
12dos Dereza, O. (2018). Lemmatization for ancient languages: Rules or neural networks? In D. Ustalov, A. Filchenkov, L. Pivovarova & J. Žižka (Eds.), Artificial Intelligence and Natural Language. AINL 2018. Communications in Computer and Information Science, vol 930. Springer. DOI: 10.1007/978-3-030-01204-5_4
Back to article
13Santos, C. N., & Guimarães, V. (2015). Boosting named entity recognition with neural character embeddings. Proceedings of the 5th Named entity workshop, Beijin, China. DOI: 10.18653/v1/W15-3904
Back to article
14Gonçalves, M. F. (2003). As ideias ortográficas em Portugal: de Madureira Feijó a Gonçalves Viana (1734–1911). Lisboa: Fundação Calouste Gulbenkian.
Back to article
15Hubková, H., Kral, P., & Pettersson, E. (2020). Czech historical named entity corpus v 1.0. Proceedings of the 12th Language Resources and Evaluation Conference, Marseille: France.
Back to article
16Jurafsky, D., & Martin, J. H. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (3rd ed. draft). Stanford University.
Back to article
17Kemmler, R. (2001). Para uma história da ortografia portuguesa: o texto metaortográfico e a sua periodização do século xvi até a reforma ortográfica de 1911. Lusorama. Zeitschrift für Lusitanistik. Revista de Estudos sobre os Países de Língua Portuguesa, 47–48, 128–319.
Back to article
18Klie, J.-C., Bugert, M., Boullosa, B., de Castilho, R. E., & Gurevych, I. (2018). The inception platform: Machine-assisted and knowledge-oriented interactive annotation. Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, Santa Fé, USA.
Back to article
19Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360.
Back to article
20Mateus, M. H. M., & Cardeira, E. (2007). Norma e variação. Alfragide: Editorial Caminho.
Back to article
21McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia medica, 22(3), 276–282. DOI: 10.11613/BM.2012.031
Back to article
22Nothman, J., Ringland, N., Radford, W., Murphy, T., & Curran, J. R. (2013). Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence, 194, 151–175. DOI: 10.1016/j.artint.2012.03.006
Back to article
23Ortiz Suárez, P. J., Romary, L., & Sagot, B. (2020). A monolingual approach to contextualised word embeddings for mid-resource languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. DOI: 10.18653/v1/2020.acl-main.156
Back to article
24Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(85), 2825–2830.
Back to article
25Romão, R. M. A. (2019). Os lugares centrais em Portugal: a área de influência de Coimbra. Masters Thesis, Instituto Superior de Economia e Gestão.
Back to article
26Romein, C. A., Kemman, M., Birkholz, J. M., Baker, J., DeGruijter, M., Meroño-Peñuela, A., Ries, T., Ros, R., & Scagliola, S. (2020). State of the field: digital history. History, 105(365), 291–312. DOI: 10.1111/1468-229X.12969
Back to article
27Sang, T. K., & Erik, F. (2002). Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. Proceedings of CoNLL-2002, Conference on Natural Language Learning, Taipei, Taiwan.
Back to article
28Santos, J. A. (1995). As freguesias: História e actualidade. Oeiras: Celta.
Back to article
29Santos, D., & Cardoso, N. (2007). Reconhecimento de entidades mencionadas em português: Documentação e actas do HAREM, a primeira avaliação conjunta na área. Retrieved from https://www.linguateca.pt/aval_conjunta/LivroHAREM/Livro-SantosCardoso2007.pdf
Back to article
30Santos, J., Consoli, B., dos Santos, C., Terra, J., Collonini, S., & Vieira, R. (2019a). Assessing the impact of contextual embeddings for Portuguese named entity recognition. Proceedings of the 8th Brazilian Conference on Intelligent Systems. Salvador, Brasil.
Back to article
31Santos, J., dos Santos, H. D. P., & Vieira, R. (2020b). Fall detection in clinical notes using language models and token classifier. Proceedings of the 33rd International Symposium on Computer-Based Medical Systems, CBMS 2020, Rochester, USA.
Back to article
32Santos, I., Olival, F., & Sequeira, O. (2020a). Excavating the data pit: the Portuguese parish memories (1758) as a gold standard. DHandNLP@PROPOR, Workshop on Digital Humanities and Natural Language Processing, Évora, Portugal.
Back to article
33Santos, J., Terra, J., Consoli, B. S., & Vieira, R. (2019b). Multidomain contextual embeddings for named entity recognition. Proceedings of the 35th Conference of the Spanish society for natural language processing, Bilbao, Spain.
Back to article
34Schmitt, X., Kubler, S., Robert, J., Papadakis, M., & Le-Traon, Y. (2019). A replicable comparison study of NER software: Stanford NLP, NLTK, Open NLP, Spacy, Gate. Sixth International Conference on Social Networks Analysis, Management and Security, Granada, Sapin.
Back to article
35Souza, F., Nogueira R., & Lotufo, R. (2019). Portuguese named entity recognition using BERT-CRF. arXiv preprint arXiv:1909.10649.
Back to article
36Verdelho, T. (1987). Latinização na história da Língua Portuguesa – o testemunho dos dicionários. Arquivos do Centro Cultural Português (volume de homenagem a Paul Teyssier), XXIII, 157–187.
Back to article

Enriching the 1758 Portuguese Parish Memories (Alentejo) with Named Entities

References

Paradigm

My account