Have a personal or library account? Click to login

Comparison of Language Models for English-Latvian Semantic Search

Open Access
|Feb 2025

Abstract

In this study, ten language models are explored and compared in an English-Latvian semantic information retrieval setting, where the indexed collection of documents is written in English while the query documents are written in Latvian. Currently, no similar research has been done regarding the Latvian language. A dataset of 77736 pairs of articles from Latvian and English Wikipedia was created, transformed into embedding vectors, and used for retrieval experiments with brute force search, Hierarchical Navigable Small World method, and Inverted File Indexing method. The LaBSE language model achieved the best performance for short texts and a version of Sentence-BERT and E5-large for long texts.

DOI: https://doi.org/10.2478/acss-2025-0004 | Journal eISSN: 2255-8691 | Journal ISSN: 2255-8683
Language: English
Page range: 34 - 39
Submitted on: Sep 30, 2024
Accepted on: Jan 2, 2025
Published on: Feb 7, 2025
Published by: Riga Technical University
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Artem Kucheravy, Gints Jēkabsons, published by Riga Technical University
This work is licensed under the Creative Commons Attribution 4.0 License.