Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is one of the higher level NLP (Natural Language Processing) tasks. It allows, for example, to extract more information about medical products from larger texts. A product such as ‘ambidextrous gloves’ may appear in a text in many different forms. For example, they could be referred to by the pronoun ‘they’, such as in this sentence. The algorithm presented in this paper finds pronouns and for each of them (except the pleonastic ‘it’) it creates a coreference candidate with entities that appeared earlier in the same sentence or in the previous sentence. Each candidate (pair of mentions) is described by 48 binary features which represent their grammatical and location properties. In the training set, each pair is marked as a coreference or not, based on which a decision tree classifier is trained. A classifier with a high precision of 0.94 and a decent recall of 0.61 were obtained on the training set, still with a good precision out of a sample of 0.64.
© 2019 Jerzy Krawczuk, Mariusz Ferenc, published by University of Białystok, Department of Pedagogy and Psychology
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.