Have a personal or library account? Click to login
Text Vectorization Techniques Based on Wordnet Cover
Open Access
|Dec 2023

Abstract

The utilization of text vectorization techniques has become essential for numerous classification tasks in present-day natural language processing. Word embedding methods commonly used today, such as Word2Vec, GloVe, etc., are based on the semantic similarity of words. WordNet, as a lexical database of words, provides a rich source of semantic information. In our article, we propose a text vectorization technique using extended text data with the data augmentation method, specifically by replacing words with their synonyms obtained from WordNet. The results obtained from text classification tasks using multiple classifiers demonstrate that expanding the corpus with this method leads to improved vector representations of words.

DOI: https://doi.org/10.2478/jazcas-2023-0048 | Journal eISSN: 1338-4287 | Journal ISSN: 0021-5597
Language: English
Page range: 310 - 322
Published on: Dec 25, 2023
Published by: Slovak Academy of Sciences, Mathematical Institute
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2023 Dávid Držík, Kirsten Šteflovič, published by Slovak Academy of Sciences, Mathematical Institute
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.