A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, “Unsupervised cross-lingual representation learning at scale,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul. 2020, pp. 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747
A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, and P. Barham, “PaLM: Scaling language modeling with pathways,” Journal of Machine Learning Research, vol. 24, pp. 1–113, 2023. https://www.jmlr.org/papers/volume24/22-1144/22-1144.pdf
Y. Yang, D. Cer, A. Ahmad, M. Guo, J. Law, N. Constant, G.H. Abrego, S. Yuan, C. Tar, Y.-H. Sung, B. Strope, and R. Kurzweil, “Multilingual universal sentence encoder for semantic retrieval,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Jul. 2020, pp. 87–94. https://doi.org/10.18653/v1/2020.acl-demos.12
B. Bijin, “A local and intelligent web information retrieval system,” M.S. thesis, University of Alberta, Alberta, AB, Canada, 2021. https://doi.org/10.7939/r3-eb5m-q238
R. Litschko, I. Vulić, S.P. Ponzetto, and G. Glavaš, “Evaluating multilingual text encoders for unsupervised cross-lingual retrieval,” in Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Lecture Notes in Computer Science, vol. 12656, Springer Verlag, Berlin, Heidelberg, Mar. 2021, pp. 342–358. https://doi.org/10.1007/978-3-030-72113-8_23
R. Litschko, I. Vulić, S.P. Ponzetto, and G. Glavaš, “On cross-lingual retrieval with multilingual text encoders,” Information Retrieval Journal, vol. 25, pp. 149–183, Mar. 2022. https://doi.org/10.1007/s10791-022-09406-x
M. Artetxe and H. Schwenk, “Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond,” Transactions of the Association for Computational Linguistics, vol. 7, pp. 597–610, Sep. 2019. https://doi.org/10.1162/tacl_a_00288
K. Heffernan, O. Çelebi, and H. Schwenk, “Bitext mining using distilled sentence representations for low-resource languages,” in Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, Dec. 2022, pp. 2101–2112. https://doi.org/10.18653/v1/2022.findings-emnlp.154
P.-A. Duquenne, H. Schwenk, and B. Sagot, “SONAR: Sentence-level multimodal and language-agnostic representations,” arXiv preprint 2308.11466, Aug. 2023. https://doi.org/10.48550/arXiv.2308.1146
F. Feng, Y. Yang, D. Cer, N. Arivazhagan, and W. Wang, “Language-agnostic BERT sentence embedding,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, May 2022, pp. 878–891. https://doi.org/10.18653/v1/2022.acl-long.62
N. Muennighoff, N. Tazi, L. Magne, and N. Reimers, “MTEB: Massive text embedding benchmark,” in Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia, May 2023, pp. 2014–2037. https://doi.org/10.18653/v1/2023.eacl-main.148
N. Reimers and I. Gurevych, “Making monolingual sentence embeddings multilingual using knowledge distillation,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Nov. 2020, pp. 3982–3992. https://doi.org/10.18653/v1/2020.emnlp-main.365
L. Wang, N. Yang, X. Huang, L. Yang, R. Majumder, and F. Wei, “Multilingual E5 text embeddings: A technical report,” arXiv preprint 2402.05672, 2024. https://arxiv.org/pdf/2406.01607
M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazaré, M. Lomeli, L. Hosseini, and H. Jégou, “The Faiss library,” arXiv preprint 2401.08281, 2024. https://arxiv.org/pdf/2401.08281
J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with GPUs,” IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, June 2019. https://doi.org/10.1109/TBDATA.2019.2921572
Y.A. Malkov and D.A. Yashunin, “Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World Graphs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 4, pp. 824–836, Apr. 2020. https://doi.org/10.1109/TPAMI.2018.2889473