Microsyntactic Annotation of Corpora and its Use in Computational Linguistics Tasks

Leonid Iomdin

doi:10.1515/jazcas-2017-0027

.blurhash-client-img { display: none !important; }

Microsyntactic Annotation of Corpora and its Use in Computational Linguistics Tasks

Journal of Linguistics/Jazykovedný casopis

Volume 68 (2017): Issue 2 (December 2017)

By: Leonid Iomdin

Open Access

|Jan 2018

Abstract

Microsyntax is a linguistic discipline dealing with idiomatic elements whose important properties are strongly related to syntax. In a way, these elements may be viewed as transitional entities between the lexicon and the grammar, which explains why they are often underrepresented in both of these resource types: the lexicographer fails to see such elements as full-fledged lexical units, while the grammarian finds them too specific to justify the creation of individual well-developed rules. As a result, such elements are poorly covered by linguistic models used in advanced modern computational linguistic tasks like high-quality machine translation or deep semantic analysis. A possible way to mend the situation and improve the coverage and adequate treatment of microsyntactic units in linguistic resources is to develop corpora with microsyntactic annotation, closely linked to specially designed lexicons. The paper shows how this task is solved in the deeply annotated corpus of Russian, SynTagRus.

References

[1] Iomdin, L. L. (2013). Nekotorye mikrosintaksičeskie konstruktsii v russkom jazyke s učastiem slova čto v kačestve sostavnogo elementa. [Certain microsyntactic constructions in Russian which contain the word čto as a constituent element.] Južnoslovenski filolog, LXIX:137–147. [In Russian.]10.2298/JFI1369137I
Search in Google Scholar Back to article
[2] Iomdin, L. L. (2014). Xorošo menja tam ne bylo: sintaksis i semantika odnogo klassa russkix razgovornyx konstruktsij. [Good thing I wasn’t there: syntax and semantics of a class of Russian colloquial constructions.] In Grammaticalization and lexicalization in the Slavic languages. Proceedings from the 36th meeting of the commission on the grammatical structure of the Slavic languages of the International committee of Slavists, pages 423–436, Verlag Otto Sagner, München/Berlin/Washington D.C. [In Russian.]
Search in Google Scholar Back to article
[3] Iomdin, L. L. (2015). Konstruktsii mikrosintaksisa, obrazovannye russkoj leksemoj raz. [Construction of microsyntax built by the Russian word raz.] SLAVIA, časopis pro slovanskou filologii, 84(3):291–306. [In Russian.]
Search in Google Scholar Back to article
[4] Iomdin, L. (2016). Microsyntactic Phenomena as a Computational Linguistics Issue. In Grammar and Lexicon: Interactions and Interfaces. Proceedings of the Workshop, pages 8–18, Osaka, Japan. Accesible at: http://aclweb.org/anthology/W/W16/W16-38.pdf.
Search in Google Scholar Back to article
[5] Fillmore, Ch. (1988). The Mechanisms of Construction Grammar. In Proceedings of the Fourteenth Annual Meeting of the Berkeley Linguistics Society, pages 35–55.10.3765/bls.v14i0.1794
Search in Google Scholar Back to article
[6] Goldberg, A. (1995). Constructions: A Construction Grammar Approach to Argument Structure. University of Chicago Press, Chicago.
Search in Google Scholar Back to article
[7] Rakhilina, E. V., editor (2010). Lingvistika konstruktsij. [The linguistics of constructions.] Azbukovnik Publishers, Moscow. [In Russian.]
Search in Google Scholar Back to article
[8] Lauwers, P. and Wettere, van N. (2017). La Micro-constructionnalisation En Tandem: La Copularisation De Tourner/virer. Langue française, 194(2):85–103.10.3917/lf.194.0085
Search in Google Scholar Back to article
[9] Rhodes, R. (2009). Tautological constructions in English … and beyond. Presented to the Syntax and Semantics Circle, UCB. Accessible at: http://linguistics.berkeley.edu/~russellrhodes/pdfs/syntax_circle_taut_qp.pdf.
Search in Google Scholar Back to article
[10] Iomdin, L. (2017). Kak nam byt’ s konstruktsijami tipa kak byt? [What to do about constructions like what to do?] Computational Linguistics and Intellectual Technologies. Dialogue 2017, 16 (23)(2):150–161. [In Russian, Engl. Abstract.]
Search in Google Scholar Back to article
[11] Marakasova, A. A. and Iomdin, L. L. (2016). Mikrosintaksičeskaja razmetka v korpuse russkix tekstov SynTagRus [Microsyntactic tagging in the SynTagRus corpus of Russian texts.] In Informacionnye texnologii i sistemy 2016 (ITiS’2016). Sbornik trudov 40-oj meždisciplinarnoj školykonferencii IPPI RAN, pages 445–449, Repino, Saint Petersburg, Russia. [In Russian.] Accessible at: http://itas2016.iitp.ru/pdf/1570285171.pdf.
Search in Google Scholar Back to article
[12] Dyachenko, P. V., Iomdin, L. L., Lazursky, A. V., Mityushin, L. G., Podlesskaya, Yu, O., Sizov, V. G., Frolova, T. I., and Tsinman, L. L. (2015). Sovremennoe sostojanie gluboko annotirovannogo korpusa tekstov russkogo jazyka (SynTagRus). [The current state of the deeply annotated corpus of Russian texts (SynTagRus).] In Nacional’nyj korpus russkogo jazyka. 10 let proektu. Trudy Instituta russkogo jazyka im. V.V. Vinogradova. M, Vol. 6, pages 272–299. [In Russian.]
Search in Google Scholar Back to article
[13] Apresjan, Ju., D., Iomdin, L. L., Sannikov, A. V., and Sizov, V. G. (2004). Semantičeskaja razmetka v gluboko annotirovannom korpuse russkogo jazyka. [Semantic Tagging in a deeply annotated corpus of Russian.] In Trudy mezhdunarodnoj konferencii «Korpusnaja lingvistika – 2004», pages 41–54, Izd-vo Sankt-Peterburgskogo universiteta, Saint Petersburg, Russia. [In Russian.]
Search in Google Scholar Back to article
[14] Mel’čuk, I. A. (1974). Opyt teorii lingvističeskix modelej «Smysl Û Tekst». [An experience of creating the theory of linguistic models of the Meaning Û Text type.] Nauka Publishers, Moscow. [In Russian.]
Search in Google Scholar Back to article
[15] Inshakova, E. S. (2016). Razrešenie sintaksičeskoj mestoimennoj anafory v sisteme «ETAP-3». [Resolution of syntactic pronominal anaphora in the ETAP-3 system.] In Informacionnye texnologii i sistemy 2016 (ITiS’2016). Sbornik trudov 40-oj meždisciplinarnoj školy-konferencii IPPI RAN, pages 420–429, Repino, Saint Petersburg, Russia. [In Russian.] Accessible at: http://itas2016.iitp.ru/pdf/1570282678.pdf.
Search in Google Scholar Back to article
[16] Marakasova, A. A. (2016). Avtomatičeskoe razrešenie anafory v russkom tekste: slučaj nulevogo sub”ekta. [Automatic resolution og anaphora in a Russian text: the case of a zero subject.] In Informacionnye texnologii i sistemy 2016 (ITiS’2016). Sbornik trudov 40-oj meždisciplinarnoj školykonferencii IPPI RAN, pages 431–436, Repino, Saint Petersburg, Russia. [In Russian.] Accessible at: http://itas2016.iitp.ru/pdf/1570285121.pdf.
Search in Google Scholar Back to article
[17] Dikonov, V. G. and Poritski, V. V. (2014). A Virtual Russian Sense Tagged Corpus and Catching Errors In A Russian Û Semantic Pivot Dictionary. Computational Linguistics and Intellectual Technologies. Dialogue 2014, 13(20):128–137.
Search in Google Scholar Back to article
[18] Mihalcea, R. (1998). SemCor semantically tagged corpus, SenseEval 2 & 3 data in SemCor format. Accessible at: http://www.cse.unt.edu/~rada/downloads.html.
Search in Google Scholar Back to article
[19] Petrolito, T. and Bond, F. (2014). A survey of WordNet Annotated Corpora. In Proceedings of the Seventh Global WordNet Conference, pages 236–243, Tartu, Estonia.
Search in Google Scholar Back to article
[20] Rosén, V., Smedt, K. de, Smørdal Losnegaard, G., Bejček, E., Savary, A. and Osenova, P. (2016). MWEs in Treebanks: From Survey to Guidelines. In Proceedings, LREC 2016, Tenth International Conference on Language Resources and Evaluation, pages 2323–2330, Portorož, Slovenia.
Search in Google Scholar Back to article
[21] Savary, A., Sangati, F., Candito, M. et al. (2017). The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pages 31–47, Valencia, Spain.10.18653/v1/W17-1704
Search in Google Scholar Back to article
[22] Apresjan, Ju. D., Boguslavsky, I. M., Iomdin, L. L., Lazursky, A. V., Mitjushin, L. G., Sannikov, V. Z., and Tsinman, L. L. (1992). Lingvističeskij processor dlja složnyx informacionnyx sistem. [A linguistic processor for complex information systems.] Nauka Publishers, Moscow. [In Russian.]
Search in Google Scholar Back to article
[23] Apresjan, Ju. D., Boguslavsky, I. M., Iomdin, L. L., and Sannikov, V. Z. (2010). Teoretičeskie problemy russkogo sintaksisa: Vzaimodejstvie grammatiki i slovarja. [Theoretical Issues of Russian Syntax: Interaction of the Grammar and the Lexicon.] In Apresjan, Ju. D., editor, Jazyki slavjanskix kul’tur. Moscow. [In Russian.]
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.1515/jazcas-2017-0027 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597

Journal RSS Feed

Language: English

Page range: 169 - 178

Published on: Jan 24, 2018

Published by: Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics

In partnership with: Paradigm Publishing Services

Publication frequency: 3 issues per year

Keywords:

Text corpora,

Russian syntactically tagged corpus SynTagRus,

syntactic idioms,

microsyntactic annotation,

microsyntactic dictionary

Related subjects:

Linguistics and semiotics,

Theoretical frameworks and disciplines,

Linguistics, other

© 2018 Leonid Iomdin, published by Slovak Academy of Sciences, Ľudovít Štúr Institute of Linguistics
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 68 (2017): Issue 2 (December 2017)