Have a personal or library account? Click to login
Analysis of Dataset Limitations in Semantic Knowledge-Driven Multi-Variant Machine Translation Cover

Analysis of Dataset Limitations in Semantic Knowledge-Driven Multi-Variant Machine Translation

Open Access
|Sep 2024

References

  1. A. Abujabal, C. D. Bovi, S.-R. Ryu, T. Gojayev, F. Triefenbach, and Y. Versley, “Continuous model improvement for language understanding with machine translation”. In: North American Chapter of the Association for Computational Linguistics, 2021.
  2. P. Anderson, B. Fernando, M. Johnson, and S. Gould, “Guided open vocabulary image captioning with constrained beam search”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, 936–945.
  3. E. Bastianelli, A. Vanzo, P. Swietojanski, and V. Rieser, “SLURP: A Spoken Language Understanding Resource Package”. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
  4. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners”, Advances in neural information processing systems, vol. 33, 2020, 1877–1901.
  5. I. Casanueva, I. Vulić, G. Spithourakis, and P. Budzianowski, “Nlu++: A multi-label, slotrich, generalisable dataset for natural language understanding in task-oriented dialogue”. In: Findings of the Association for Computational Linguistics: NAACL 2022, 2022, 1998–2013.
  6. X. Cheng, W. Xu, Z. Yao, Z. Zhu, Y. Li, H. Li, and Y. Zou, “Fc-mtlf: a fine-and coarse-grained multitask learning framework for cross-lingual spoken language understanding”. In: Proceedings of Interspeech, 2023.
  7. A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, É. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, “Unsupervised cross-lingual representation learning at scale”. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, 8440–8451.
  8. J. FitzGerald, C. Hench, C. Peris, S. Mackie, K. Rottmann, A. Sanchez, A. Nash, L. Urbach, V. Kakarala, R. Singh, S. Ranganath, L. Crist, M. Britan, W. Leeuwis, G. Tur, and P. Natarajan, “MASSIVE: A 1M-example multilingual natural language understanding dataset with 51 typologically-diverse languages”. In: A. Rogers, J. Boyd-Graber, and N. Okazaki, eds., Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada, 2023, 4277–4302, 10.18653/v1/2023.acl-long.235.
  9. M. Fomicheva, L. Specia, and F. Guzmán, “Multihypothesis machine translation evaluation”. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, 1218–1232.
  10. J. Gaspers, P. Karanasou, and R. Chatterjee, “Selecting machine-translated data for quick bootstrapping of a natural language understanding system”. In: Proceedings of NAACL-HLT, 2018, 137–144.
  11. R. Goel, W. Ammar, A. Gupta, S. Vashishtha, M. Sano, F. Surani, M. Chang, H. Choe, D. Greene, C. He, R. Nitisaroj, A. Trukhina, S. Paul, P. Shah, R. Shah, and Z. Yu, “PRESTO: A multilingual dataset for parsing realistic task-oriented dialogs”. In: H. Bouamor, J. Pino, and K. Bali, eds., Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 2023, 10820–10833, 10.18653/v1/2023.emnlp-main.667.
  12. S. Gupta, R. Shah, M. Mohit, A. Kumar, and M. Lewis, “Semantic parsing for task oriented dialog using hierarchical representations”. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, 2787–2792.
  13. A. Huminski, F. Liausvia, and A. Goel, “Semantic roles in verbnet and framenet: Statistical analysis and evaluation”. In: Computational Linguistics and Intelligent Text Processing: 20th International Conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, Revised Selected Papers, Part II, 2023, 135–147.
  14. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization”. In: Proc. of the 6th International Conference on Learning Representations (ICRL 2015), San Diego, CA, 2015.
  15. B. Levin, English verb classes and alternations: A preliminary investigation, University of Chicago press, 1993.
  16. H. Li, A. Arora, S. Chen, A. Gupta, S. Gupta, and Y. Mehdad, “Mtop: A comprehensive multilingual task-oriented semantic parsing benchmark”. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021, 2950–2962.
  17. O. Majewska and A. Korhonen, “Verb classification across languages”, Annual Review of Linguistics, vol. 9, 2023.
  18. M. Moneglia, “Natural language ontology of action: A gap with huge consequences for natural language understanding and machine translation”. In: Language and Technology Conference, 2011, 379–395.
  19. L. Qin, Q. Chen, T. Xie, Q. Li, J.-G. Lou, W. Che, and M.-Y. Kan, “Gl-clef: A global-local contrastive learning framework for cross-lingual spoken language understanding”. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, 2677–2686.
  20. S. Schuster, S. Gupta, R. Shah, and M. Lewis, “Cross-lingual transfer learning for multilingual task oriented dialog”. In: Proceedings of NAACLHLT, 2019, 3795–3805.
  21. R. Sennrich, B. Haddow, and A. Birch, “Improving neural machine translation models with monolingual data”. In: 54th Annual Meeting of the Association for Computational Linguistics, 2016, 86–96.
  22. M. Sowański. “iva_mt_wslot-m2m100_418m-enpl”, 2023. Hugging Face Model Hub.
  23. M. Sowański. “iva_mt_wslot-m2m100_418m-enpl”, 2023. Hugging Face Model Hub.
  24. M. Sowański and A. Janicki, “Leyzer: A dataset for multilingual virtual assistants”. In: P. Sojka, I. Kopeček, K. Pala, and A. Horák, eds., Proc. Conference on Text, Speech, and Dialogue (TSD2020), Brno, Czechia, 2020, 477–486.
  25. M. Sowański and A. Janicki, “Optimizing machine translation for virtual assistants: Multi-variant generation with verbnet and conditional beam search”. In: 2023 18th Conference on Computer Science and Intelligence Systems (FedCSIS), 2023, 1149–1154, 10.15439/2023F8601.
  26. L. Sun, A. Korhonen, and Y. Krymolowski, “Verb class discovery from rich syntactic data”, Lecture Notes in Computer Science, vol. 4919, 2008, 16.
  27. D. R. Traum, Speech acts for dialogue agents, Springer, 1999, 169–201.
  28. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need”, Advances in neural information processing systems, vol. 30, 2017.
  29. W. Xu, B. Haider, and S. Mansour, “End-to-end slot alignment and recognition for cross-lingual NLU”. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, 5052–5063.
DOI: https://doi.org/10.14313/jamris/3-2024/20 | Journal eISSN: 2080-2145 | Journal ISSN: 1897-8649
Language: English
Page range: 39 - 48
Submitted on: Dec 27, 2023
Accepted on: Mar 10, 2024
Published on: Sep 12, 2024
Published by: Łukasiewicz Research Network – Industrial Research Institute for Automation and Measurements PIAP
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2024 Marcin Sowański, Jakub Hościłowicz, Artur Janicki, published by Łukasiewicz Research Network – Industrial Research Institute for Automation and Measurements PIAP
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.