References
- 1Akbik, A., Blythe, D., & Vollgraf, R. (2018).
Contextual string embeddings for sequence labeling . In Proceedings of the 27th international conference on computational linguistics (coling) (pp. 1638–1649). New Mexico: Paparazzi Press. Retrieved fromhttps://aclanthology.org/C18-1139 - 2Altammami, S., Atwell, E., & Alsalka, A. (2020). The Arabic-English parallel corpus of authentic hadith. International Journal on Islamic Applications in Computer Science And Technology, 8(2). DOI:
http://www.sign-ific-ance.co.uk/index.php/IJASAT/article/view/2199 - 3Arcan, M., Thomas, S. M., de Brandt, D., & Buitelaar, P. (2013).
Translating the FINREP taxonomy using a domain-specific corpus . In Proceedings of Chinese translation summit XIV. Nice, France. Retrieved fromhttps://aclanthology.org/2013.mtsummit-posters.1.pdf - 4Beikian, A., & Borzoufard, M. (2016). Mizan: A large persian-english parallel corpus. Retrieved from
https://cdn.ketabchi.com/products/175402/pdfs/ketab-general-book-sample-wybml.pdf - 5Bick, E., & Barreiro, A. (2015). Automatic anonymisation of a new portuguese-english parallel corpus in the legal-financial domain. Oslo Studies in Language, 7(1), 101–124. Retrieved from
https://journals.uio.no/index.php/osla/article/view/1460/1357 . DOI: 10.5617/osla.1460 - 6Boldrini, E., & Ferrández, S. (2009, March 1–7). A parallel corpus labeled using open and restricted domain ontologies. In Proceedings of 10th international conference CICLing. Mexico City, Mexico. DOI: 10.1007/978-3-642-00382-0_28
- 7Bureros, L. L., Tabaranza, Z. L. B., & Roxas, R. R. (2015). Building an English-Cebuano tourism parallel corpus and a named-entity list from the Web. In Proceedings of workshop on computation: Theory and practice (pp. 158–169). DOI: 10.1142/9789813202818_0012
- 8Chang, B. (2004). Chinese-English parallel corpus construction and its application. In Proceedings of the PACLIC (pp. 201–204). Tokyo:
Waseda University ,Dec. 8–10 . Retrieved fromhttps://aclanthology.org/Y04-1030.pdf - 9Chiu, J. P. C., & Nichols, E. (2016). Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics (pp. 357–370). DOI: 10.1162/tacl_a_00104
- 10Christodoulopoulos, C., & Steedman, M. (2014). The Bible in 100 Languages. Retrieved from
https://github.com/christos-c/bible-corpus - 11Dipper, S., & Schultz-Balluff, S. (2013). The Anselm Corpus: Methods and perspectives of a parallel aligned corpus. In Proceedings of the workshop on computational historical linguistics at NODALIDA. NEALT (pp. 27–42). Retrieved from
https://ep.liu.se/ecp/087/ecp13087.pdf#page=35 - 12Dua, D., & Graff, C. (2017). UCI machine learning repository. Retrieved from
http://archive.ics.uci.edu/ml - 13Espla-Gomis, M., Klubička, F., Ljubešić, N., Ortiz-Rojas, S., Papavassiliou, V., & Prokopidis, P. (2014).
Comparing two acquisition systems for automatically building an English-Croatian parallel corpus from multilingual websites . In Proceedings of the ninth international conference on language resources and evaluation (pp. 1252–1258). European Language Resources Association (ELRA). Retrieved fromhttp://www.lrec-conf.org/proceedings/lrec2014/pdf/529Paper.pdf - 14Fraisse, A., Tran, Q.-T., Jenn, R., Paroubek, P., & Fishkin, S. (2018, May).
TransLiTex: A parallel corpus of translated literary texts . In Proceedings of the eleventh international conference on language resources and evaluation (pp. 201–204). Miyazaki, Japan: European Language Resources Association (ELRA). Retrieved fromhttps://hal.archives-ouvertes.fr/hal-01827884/file/11W34.pdf - 15Frankenberg-Garcia, A. (2009). Compiling and using a parallel corpus for research in translation. Babel: International journal of translation, 21(1), 57–71. Retrieved from
https://openresearch.surrey.ac.uk/esploro/outputs/journalArticle/Compiling-and-using-a-parallel-corpus-for-research-in-translation/99516816302346#file-0 - 16Ghaddar, A., & Langlais, P. (2020).
Sedar: a large scale French-english financial domain parallel corpus . In Proceedings of the language resources and evaluation conference (pp. 3595–3602). Marseille, France: European Language Resources Association. Retrieved fromhttps://aclanthology.org/2020.lrec-1.442 - 17Giouli, V., Glaros, N., Simov, K., & Osenova, P. (2009).
A web-enabled and speech-enhanced parallel corpus of Greek-Bulgarian cultural texts . In Proceedings of the of the EACL workshop on language technology and resources for cultural heritage, social sciences, humanities, and education (pp. 35–42). Athens, Greece: Association for Computational Linguistics. Retrieved fromhttps://aclanthology.org/W09-0305.pdf . DOI: 10.3115/1642049.1642054 - 18Guo, X. (2016, November 17–18).
Drawing a route map of making a small domain-specific parallel corpus for translators and beyond . In Proceedings of translating and the computer (pp. 88–99). London, UK. Retrieved fromhttps://aclanthology.org/2016.tc-1.9.pdf - 19Guzman, J. R. (2013).
El corpus COVALT i l’eina d’alineament de frases Alfra-COVALT . In L. Bracho Lapiedra (Ed.), El corpus COVALT: un observatori de fraseologia traduïda (pp. 49–60). Aachen: Shaker. - 20Hamoud, B., & Atwell, E. (2017). Evaluation corpus for restricted-domain question-answering systems for the holy Quran. International Journal of Science and Research, 6(8), 1133–1138. Retrieved from
https://eprints.whiterose.ac.uk/125920/ - 21Kashefi, O. (2020). MIZAN: A large Persian-English parallel corpus. Retrieved from
https://arxiv.org/pdf/1801.02107v3.pdf - 22Kenny, D. (1999). The German-English parallel corpus of literary texts (GEPCOLT): A resource for translation scholars. Teanga, 1, 25–42.
- 23Koehn, P. (2005). Europarl. Retrieved from
http://www.statmt.org/europarl/ - 24Kolchinsky, A., Lourenco, A., Wu, H.-Y., & Rocha, L. M. (2015). Extraction of pharmacokinetic evidence of drug-drug interactions from the literature. PLOS ONE. DOI: 10.1371/journal.pone.0122199
- 25Labaka, G., Alegria, I., & Sarasola, K. (2016).
Domain adaptation in MT using Wikipedia as a parallel corpus: Resources and evaluation . In Proceedings of the tenth international conference on language resources and evaluation (pp. 2209–2213). Portoroz, Slovenia: European Language Resources Association (ELRA). - 26Lan, H., & Huang, J. (2017, February). Chinese-English cross-language text clustering algorithm based on latent semantic analysis. In Proceedings of information science and cloud computing (pp. 1–7). Retrieved from
https://pos.sissa.it/300/007/pdf - 27Lee, C.-H., & Yang, H.-C. (2000).
Towards multilingual information discovery through a SOM based text mining approach . In PRICAI workshop on text and web mining (pp. 80–87). Melbourne, Australia. Retrieved fromhttps://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.8800&rep=rep1&type=pdf - 28Lee, D.-Y. (2011). A corpus-based translation of Korean financial reports into English. Journal of Universal Language, 12(1), 75–94. Retrieved from
https://www.sejongjul.org/download/downloadpdf?pid=jul-12-1-75 . DOI: 10.22425/jul.2011.12.1.75 - 29Lefever, E., Macken, L., & Hoste, V. (2009, 30 March – 3 April). Language-independent bilingual terminology extraction from a multilingual parallel corpus. In Proceedings of the 12th conference of the European Chapter of the ACL (pp. 1746–1751). Athens, Greece. Retrieved from
https://aclanthology.org/E09-1057.pdf . DOI: 10.3115/1609067.1609122 - 30Li, L., Wang, P., Huang, D., & Zhao, L. (2011). Mining English-Chinese named entity pairs from comparable corpora. ACM Transactions on Asian Language Information Processing, 10, 1–19. DOI: 10.1145/2025384.2025387
- 31Lu, B., Tsou, B. K., Jiang, T., Kwong, O. Y., & Zhu, J. (2010).
Mining large-scale parallel corpora from multilingual patents: An English-Chinese example and its application to SMT . In Proceedings of the 1st CIPS-SIGHAN joint conference on Chinese language processing (pp. 79–86). Beijing. Retrieved fromhttps://aclanthology.org/W10-4110.pdf - 32McEnery, T., & Xiao, Z. (2007). Parallel and comparable corpora – the state of play. In N. T. Y. Kawaguchi, T. Takagaki & Y. Tsuruga (Eds.), Proceedings of the international conference on Asian language processing (pp. 131–146). Amsterdam:
Benjamin . DOI: 10.1075/ubli.6.11mce - 33Miletic, A., Stosic, D., & Marjanović, D. (2017).
ParCoLab: A Parallel Corpus for Serbian, French and English . In K. Ekštein & V. Matoušek (Eds.), Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science, 10415, 201–204. Berlin: Springer-Verlag. DOI: 10.1007/978-3-319-64206-2 - 34Neves, M., Yepes, A. J., & Névéol, A. (2016).
The Scielo Corpus: A parallel corpus of scientific publications for biomedicine . In Proceedings of the 15th international conference on language resources and evaluation. European Language Resources Association. Retrieved fromhttps://aclanthology.org/L16-1470 - 35Ponay, C. S., & Cheng, C. K. (2015).
Building an English-Filipino tourism corpus and lexicon for an ASEAN language translation system . In Proceedings of the international conference ASIALEX (pp. 201–204). Hong Kong: Polytechnic University. Retrieved fromhttps://www.researchgate.net/profile/Charmaine-Ponay-2/publication/27994689223BuildinganEnglish-FilipinoTourismCorpusandLexiconforanASEANLanguageTranslationSystem/links/559f2fee08ae97223ddc602f/23-Building-an-English-Filipino-Tourism-Corpus-and-Lexicon-for-an-ASEAN-Language-Translation-System.pdf - 36Rosemeyer, M., & Enrique-Arias, A. (2016). A match made in heaven: Using parallel corpora and multinomial logistic regression to analyze the expression of possession in Old Spanish. Language Variation and Change, 28(03), 307–334. DOI: 10.1017/S0954394516000120
- 37Rovenchak, A. (2021). Bamana tales recorded by Umaru Nanankr Jara: A comparative study based on a Bamana-French parallel corpus. Mandenkan, 64, 81–104. DOI: 10.4000/mandenkan.2471
- 38Schwenk, H., Chaudhary, V., Sun, S., Gong, H., & Guzmán, F. (2021, April).
WikiMatrix: Mining 135M parallel sentences in 1620 language pairs from Wikipedia . In Proceedings of the 16th conference of the European Chapter of the Association for Computational Linguistics: Main volume (pp. 1351–1361). Online: Association for Computational Linguistics. Retrieved fromhttps://www.aclweb.org/anthology/2021.eacl-main.115 . DOI: 10.18653/v1/2021.eacl-main.115 - 39Smirnova, O., & Rackevičienė, S. (2020). English-French-Lithuanian parallel corpus of EU financial documents. Retrieved from
http://hdl.handle.net/20.500.11821/35 - 40Srivastava, J., & Sanyal, S. (2015). POS-based word alignment for small corpus. In Proceedings of international conference on Asian language processing (pp. 37–40). DOI: 10.1109/IALP.2015.7451526
- 41Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., & Varga, D. (2006, 24–26 May). The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th international conference on language resources and evaluation (pp. 2142–2147). Genoa, Italy. Retrieved from
https://arxiv.org/abs/cs/0609058 - 42Sturgeon, D. (Ed.). (2021). Ancient Chinese Books Datasets (Chinese Text Project). Retrieved from
https://ctext.org/daoism - 43Tian, L., Wong, D. F., Chao, L. S., Quaresma, P., Oliveira, F., & Yi, L. (2014).
UM-Corpus: A Large English-Chinese parallel corpus for statistical machine translation . In LREC. Reykjavik, Iceland: European Language Resources Association (ELRA). Retrieved fromhttp://www.lrec-conf.org/proceedings/lrec2014/pdf/774Paper.pdf - 44Tiedemann, J. (2012, May).
Parallel data, tools and interfaces in OPUS . In N. Calzolari et al. (Eds.), Proceedings of the eighth international conference on language resources and evaluation (pp. 2214–2218). Istanbul, Turkey: European Language Resources Association (ELRA). Retrieved fromhttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.673.2874&rep=rep1&type=pdf - 45Turenne, N. (2018, January). The rumour spectrum. PLOS ONE, 13(1), 1–27. DOI: 10.1371/journal.pone.0189080
- 46Turenne, N., Xu, B., Li, X., Xu, X., Liu, H., & Zhu, X. (2020).
Exploration of a balanced reference corpus with a wide variety of text mining tools . In Proceedings of ACAI 2020: 2020 3rd international conference on algorithms, computing and artificial intelligence (pp. 1–9). New Mexico, USA: ACM Digital Library. DOI: 10.1145/3446132.3446192 - 47Volk, M., Amrhein, C., Aepli, N., Müller, M., & Ströbel, P. (2016). Building a parallel corpus on the world’s oldest banking magazine. In Proceedings of the 13th conference on natural language processing (konvens) (pp. 288–296). DOI: 10.5167/uzh-125746
- 48Woldeyohannis, M. M., Besacier, L., & Meshesha, M. (2018). A corpus for Amharic-English speech translation: The case of tourism domain. In F. Mekuria, E. Nigussie, W. Dargie, M. Edward & T. Tegegne (Eds.), Proceedings of information and communication technology for development for Africa. ict4da 2017. Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (Vol. 244). DOI: 10.1007/978-3-319-95153-9
- 49Wu, E., & Xia, X. (1994). Learning an English-Chinese lexicon from a parallel corpus. In Proceedings of the first conference of the association for machine translation in the Americas (pp. 206–213). Retrieved from
https://aclanthology.org/1994.amta-1.26.pdf - 50Xiong, W. (2013). The development of the malaysian hansard corpus: A corpus of parliamentary debates 1959–2020. New Technology of Library and Information Service, Vol. Issue (6): 36–41. DOI: 10.11925/infotech.1003-3513.2013.06.06
- 51Yang, C. C., & Li, K. W. (2003). Automatic construction of English/Chinese parallel corpora. J. Am. Soc. Inf. Sci. Technol., 54, 730–742. Retrieved from
https://aclanthology.org/A00-1004.pdf . DOI: 10.1002/asi.10261 - 52Zhai, Y., Liu, L., Zhong, X., Illouz, G., & Vilnat, A. (2020, May).
Building an English-Chinese parallel corpus annotated with sub-sentential translation techniques . In Proceedings of the 12th language resources and evaluation conference (pp. 4024–4033). Marseille, France: European Language Resources Association. Retrieved fromhttps://www.aclweb.org/anthology/2020.lrec-1.496 - 53Zhao, B., & Vogel, S. (2002).
Adaptive parallel sentences mining from web bilingual news collection . In zz (Ed.), Proceedings of the IEEE international conference on data mining (pp. 745–748). Beijing. DOI: 10.1109/ICDM.2002.1184044
