Have a personal or library account? Click to login
Analysis of Dataset Limitations in Semantic Knowledge-Driven Multi-Variant Machine Translation Cover

Analysis of Dataset Limitations in Semantic Knowledge-Driven Multi-Variant Machine Translation

Open Access
|Sep 2024

Abstract

In this study, we explore the implications of dataset limitations in semantic knowledge-driven machine translation (MT) for intelligent virtual assistants (IVA). Our approach diverges from traditional single-best translation techniques, utilizing a multi-variant MT method that generates multiple valid translations per input sentence through a constrained beam search. This method extends beyond the typical constraints of specific verb ontologies, embedding within a broader semantic knowledge framework.

We evaluate the performance of multi-variant MT models in translating training sets for Natural Language Understanding (NLU) models. These models are applied to semantically diverse datasets, including a detailed evaluation using the standard MultiATIS++ dataset. The results from this evaluation indicate that while multivariant MT method is promising, its impact on improving intent classification (IC) accuracy is limited when applied to conventional datasets such as MultiATIS++. However, our findings underscore that the effectiveness of multivariant translation is closely associated with the diversity and suitability of the datasets utilized.

Finally, we provide an in-depth analysis focused on generating variant-aware NLU datasets. This analysis aims to offer guidance on enhancing NLU models through semantically rich and variant-sensitive datasets, maximizing the advantages of multi-variant MT.

DOI: https://doi.org/10.14313/jamris/3-2024/20 | Journal eISSN: 2080-2145 | Journal ISSN: 1897-8649
Language: English
Page range: 39 - 48
Submitted on: Dec 27, 2023
Accepted on: Mar 10, 2024
Published on: Sep 12, 2024
Published by: Łukasiewicz Research Network – Industrial Research Institute for Automation and Measurements PIAP
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2024 Marcin Sowański, Jakub Hościłowicz, Artur Janicki, published by Łukasiewicz Research Network – Industrial Research Institute for Automation and Measurements PIAP
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.