Have a personal or library account? Click to login
Mining an English-Chinese parallel Dataset of Financial News Cover

Mining an English-Chinese parallel Dataset of Financial News

Open Access
|Mar 2022

References

  1. 1Akbik, A., Blythe, D., & Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In Proceedings of the 27th international conference on computational linguistics (coling) (pp. 16381649). New Mexico: Paparazzi Press. Retrieved from https://aclanthology.org/C18-1139
  2. 2Altammami, S., Atwell, E., & Alsalka, A. (2020). The Arabic-English parallel corpus of authentic hadith. International Journal on Islamic Applications in Computer Science And Technology, 8(2). DOI: http://www.sign-ific-ance.co.uk/index.php/IJASAT/article/view/2199
  3. 3Arcan, M., Thomas, S. M., de Brandt, D., & Buitelaar, P. (2013). Translating the FINREP taxonomy using a domain-specific corpus. In Proceedings of Chinese translation summit XIV. Nice, France. Retrieved from https://aclanthology.org/2013.mtsummit-posters.1.pdf
  4. 4Beikian, A., & Borzoufard, M. (2016). Mizan: A large persian-english parallel corpus. Retrieved from https://cdn.ketabchi.com/products/175402/pdfs/ketab-general-book-sample-wybml.pdf
  5. 5Bick, E., & Barreiro, A. (2015). Automatic anonymisation of a new portuguese-english parallel corpus in the legal-financial domain. Oslo Studies in Language, 7(1), 101124. Retrieved from https://journals.uio.no/index.php/osla/article/view/1460/1357. DOI: 10.5617/osla.1460
  6. 6Boldrini, E., & Ferrández, S. (2009, March 17). A parallel corpus labeled using open and restricted domain ontologies. In Proceedings of 10th international conference CICLing. Mexico City, Mexico. DOI: 10.1007/978-3-642-00382-0_28
  7. 7Bureros, L. L., Tabaranza, Z. L. B., & Roxas, R. R. (2015). Building an English-Cebuano tourism parallel corpus and a named-entity list from the Web. In Proceedings of workshop on computation: Theory and practice (pp. 158169). DOI: 10.1142/9789813202818_0012
  8. 8Chang, B. (2004). Chinese-English parallel corpus construction and its application. In Proceedings of the PACLIC (pp. 201204). Tokyo: Waseda University, Dec. 8–10. Retrieved from https://aclanthology.org/Y04-1030.pdf
  9. 9Chiu, J. P. C., & Nichols, E. (2016). Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics (pp. 357370). DOI: 10.1162/tacl_a_00104
  10. 10Christodoulopoulos, C., & Steedman, M. (2014). The Bible in 100 Languages. Retrieved from https://github.com/christos-c/bible-corpus
  11. 11Dipper, S., & Schultz-Balluff, S. (2013). The Anselm Corpus: Methods and perspectives of a parallel aligned corpus. In Proceedings of the workshop on computational historical linguistics at NODALIDA. NEALT (pp. 2742). Retrieved from https://ep.liu.se/ecp/087/ecp13087.pdf#page=35
  12. 12Dua, D., & Graff, C. (2017). UCI machine learning repository. Retrieved from http://archive.ics.uci.edu/ml
  13. 13Espla-Gomis, M., Klubička, F., Ljubešić, N., Ortiz-Rojas, S., Papavassiliou, V., & Prokopidis, P. (2014). Comparing two acquisition systems for automatically building an English-Croatian parallel corpus from multilingual websites. In Proceedings of the ninth international conference on language resources and evaluation (pp. 12521258). European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2014/pdf/529Paper.pdf
  14. 14Fraisse, A., Tran, Q.-T., Jenn, R., Paroubek, P., & Fishkin, S. (2018, May). TransLiTex: A parallel corpus of translated literary texts. In Proceedings of the eleventh international conference on language resources and evaluation (pp. 201204). Miyazaki, Japan: European Language Resources Association (ELRA). Retrieved from https://hal.archives-ouvertes.fr/hal-01827884/file/11W34.pdf
  15. 15Frankenberg-Garcia, A. (2009). Compiling and using a parallel corpus for research in translation. Babel: International journal of translation, 21(1), 5771. Retrieved from https://openresearch.surrey.ac.uk/esploro/outputs/journalArticle/Compiling-and-using-a-parallel-corpus-for-research-in-translation/99516816302346#file-0
  16. 16Ghaddar, A., & Langlais, P. (2020). Sedar: a large scale French-english financial domain parallel corpus. In Proceedings of the language resources and evaluation conference (pp. 35953602). Marseille, France: European Language Resources Association. Retrieved from https://aclanthology.org/2020.lrec-1.442
  17. 17Giouli, V., Glaros, N., Simov, K., & Osenova, P. (2009). A web-enabled and speech-enhanced parallel corpus of Greek-Bulgarian cultural texts. In Proceedings of the of the EACL workshop on language technology and resources for cultural heritage, social sciences, humanities, and education (pp. 3542). Athens, Greece: Association for Computational Linguistics. Retrieved from https://aclanthology.org/W09-0305.pdf. DOI: 10.3115/1642049.1642054
  18. 18Guo, X. (2016, November 17–18). Drawing a route map of making a small domain-specific parallel corpus for translators and beyond. In Proceedings of translating and the computer (pp. 8899). London, UK. Retrieved from https://aclanthology.org/2016.tc-1.9.pdf
  19. 19Guzman, J. R. (2013). El corpus COVALT i l’eina d’alineament de frases Alfra-COVALT. In L. Bracho Lapiedra (Ed.), El corpus COVALT: un observatori de fraseologia traduïda (pp. 4960). Aachen: Shaker.
  20. 20Hamoud, B., & Atwell, E. (2017). Evaluation corpus for restricted-domain question-answering systems for the holy Quran. International Journal of Science and Research, 6(8), 11331138. Retrieved from https://eprints.whiterose.ac.uk/125920/
  21. 21Kashefi, O. (2020). MIZAN: A large Persian-English parallel corpus. Retrieved from https://arxiv.org/pdf/1801.02107v3.pdf
  22. 22Kenny, D. (1999). The German-English parallel corpus of literary texts (GEPCOLT): A resource for translation scholars. Teanga, 1, 2542.
  23. 23Koehn, P. (2005). Europarl. Retrieved from http://www.statmt.org/europarl/
  24. 24Kolchinsky, A., Lourenco, A., Wu, H.-Y., & Rocha, L. M. (2015). Extraction of pharmacokinetic evidence of drug-drug interactions from the literature. PLOS ONE. DOI: 10.1371/journal.pone.0122199
  25. 25Labaka, G., Alegria, I., & Sarasola, K. (2016). Domain adaptation in MT using Wikipedia as a parallel corpus: Resources and evaluation. In Proceedings of the tenth international conference on language resources and evaluation (pp. 22092213). Portoroz, Slovenia: European Language Resources Association (ELRA).
  26. 26Lan, H., & Huang, J. (2017, February). Chinese-English cross-language text clustering algorithm based on latent semantic analysis. In Proceedings of information science and cloud computing (pp. 17). Retrieved from https://pos.sissa.it/300/007/pdf
  27. 27Lee, C.-H., & Yang, H.-C. (2000). Towards multilingual information discovery through a SOM based text mining approach. In PRICAI workshop on text and web mining (pp. 8087). Melbourne, Australia. Retrieved from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.8800&rep=rep1&type=pdf
  28. 28Lee, D.-Y. (2011). A corpus-based translation of Korean financial reports into English. Journal of Universal Language, 12(1), 7594. Retrieved from https://www.sejongjul.org/download/downloadpdf?pid=jul-12-1-75. DOI: 10.22425/jul.2011.12.1.75
  29. 29Lefever, E., Macken, L., & Hoste, V. (2009, 30 March – 3 April). Language-independent bilingual terminology extraction from a multilingual parallel corpus. In Proceedings of the 12th conference of the European Chapter of the ACL (pp. 17461751). Athens, Greece. Retrieved from https://aclanthology.org/E09-1057.pdf. DOI: 10.3115/1609067.1609122
  30. 30Li, L., Wang, P., Huang, D., & Zhao, L. (2011). Mining English-Chinese named entity pairs from comparable corpora. ACM Transactions on Asian Language Information Processing, 10, 119. DOI: 10.1145/2025384.2025387
  31. 31Lu, B., Tsou, B. K., Jiang, T., Kwong, O. Y., & Zhu, J. (2010). Mining large-scale parallel corpora from multilingual patents: An English-Chinese example and its application to SMT. In Proceedings of the 1st CIPS-SIGHAN joint conference on Chinese language processing (pp. 7986). Beijing. Retrieved from https://aclanthology.org/W10-4110.pdf
  32. 32McEnery, T., & Xiao, Z. (2007). Parallel and comparable corpora – the state of play. In N. T. Y. Kawaguchi, T. Takagaki & Y. Tsuruga (Eds.), Proceedings of the international conference on Asian language processing (pp. 131146). Amsterdam: Benjamin. DOI: 10.1075/ubli.6.11mce
  33. 33Miletic, A., Stosic, D., & Marjanović, D. (2017). ParCoLab: A Parallel Corpus for Serbian, French and English. In K. Ekštein & V. Matoušek (Eds.), Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science, 10415, 201204. Berlin: Springer-Verlag. DOI: 10.1007/978-3-319-64206-2
  34. 34Neves, M., Yepes, A. J., & Névéol, A. (2016). The Scielo Corpus: A parallel corpus of scientific publications for biomedicine. In Proceedings of the 15th international conference on language resources and evaluation. European Language Resources Association. Retrieved from https://aclanthology.org/L16-1470
  35. 35Ponay, C. S., & Cheng, C. K. (2015). Building an English-Filipino tourism corpus and lexicon for an ASEAN language translation system. In Proceedings of the international conference ASIALEX (pp. 201204). Hong Kong: Polytechnic University. Retrieved from https://www.researchgate.net/profile/Charmaine-Ponay-2/publication/27994689223BuildinganEnglish-FilipinoTourismCorpusandLexiconforanASEANLanguageTranslationSystem/links/559f2fee08ae97223ddc602f/23-Building-an-English-Filipino-Tourism-Corpus-and-Lexicon-for-an-ASEAN-Language-Translation-System.pdf
  36. 36Rosemeyer, M., & Enrique-Arias, A. (2016). A match made in heaven: Using parallel corpora and multinomial logistic regression to analyze the expression of possession in Old Spanish. Language Variation and Change, 28(03), 307334. DOI: 10.1017/S0954394516000120
  37. 37Rovenchak, A. (2021). Bamana tales recorded by Umaru Nanankr Jara: A comparative study based on a Bamana-French parallel corpus. Mandenkan, 64, 81104. DOI: 10.4000/mandenkan.2471
  38. 38Schwenk, H., Chaudhary, V., Sun, S., Gong, H., & Guzmán, F. (2021, April). WikiMatrix: Mining 135M parallel sentences in 1620 language pairs from Wikipedia. In Proceedings of the 16th conference of the European Chapter of the Association for Computational Linguistics: Main volume (pp. 13511361). Online: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/2021.eacl-main.115. DOI: 10.18653/v1/2021.eacl-main.115
  39. 39Smirnova, O., & Rackevičienė, S. (2020). English-French-Lithuanian parallel corpus of EU financial documents. Retrieved from http://hdl.handle.net/20.500.11821/35
  40. 40Srivastava, J., & Sanyal, S. (2015). POS-based word alignment for small corpus. In Proceedings of international conference on Asian language processing (pp. 3740). DOI: 10.1109/IALP.2015.7451526
  41. 41Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., & Varga, D. (2006, 24–26 May). The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th international conference on language resources and evaluation (pp. 21422147). Genoa, Italy. Retrieved from https://arxiv.org/abs/cs/0609058
  42. 42Sturgeon, D. (Ed.). (2021). Ancient Chinese Books Datasets (Chinese Text Project). Retrieved from https://ctext.org/daoism
  43. 43Tian, L., Wong, D. F., Chao, L. S., Quaresma, P., Oliveira, F., & Yi, L. (2014). UM-Corpus: A Large English-Chinese parallel corpus for statistical machine translation. In LREC. Reykjavik, Iceland: European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2014/pdf/774Paper.pdf
  44. 44Tiedemann, J. (2012, May). Parallel data, tools and interfaces in OPUS. In N. Calzolari et al. (Eds.), Proceedings of the eighth international conference on language resources and evaluation (pp. 22142218). Istanbul, Turkey: European Language Resources Association (ELRA). Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.673.2874&rep=rep1&type=pdf
  45. 45Turenne, N. (2018, January). The rumour spectrum. PLOS ONE, 13(1), 127. DOI: 10.1371/journal.pone.0189080
  46. 46Turenne, N., Xu, B., Li, X., Xu, X., Liu, H., & Zhu, X. (2020). Exploration of a balanced reference corpus with a wide variety of text mining tools. In Proceedings of ACAI 2020: 2020 3rd international conference on algorithms, computing and artificial intelligence (pp. 19). New Mexico, USA: ACM Digital Library. DOI: 10.1145/3446132.3446192
  47. 47Volk, M., Amrhein, C., Aepli, N., Müller, M., & Ströbel, P. (2016). Building a parallel corpus on the world’s oldest banking magazine. In Proceedings of the 13th conference on natural language processing (konvens) (pp. 288296). DOI: 10.5167/uzh-125746
  48. 48Woldeyohannis, M. M., Besacier, L., & Meshesha, M. (2018). A corpus for Amharic-English speech translation: The case of tourism domain. In F. Mekuria, E. Nigussie, W. Dargie, M. Edward & T. Tegegne (Eds.), Proceedings of information and communication technology for development for Africa. ict4da 2017. Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (Vol. 244). DOI: 10.1007/978-3-319-95153-9
  49. 49Wu, E., & Xia, X. (1994). Learning an English-Chinese lexicon from a parallel corpus. In Proceedings of the first conference of the association for machine translation in the Americas (pp. 206213). Retrieved from https://aclanthology.org/1994.amta-1.26.pdf
  50. 50Xiong, W. (2013). The development of the malaysian hansard corpus: A corpus of parliamentary debates 1959–2020. New Technology of Library and Information Service, Vol. Issue (6): 3641. DOI: 10.11925/infotech.1003-3513.2013.06.06
  51. 51Yang, C. C., & Li, K. W. (2003). Automatic construction of English/Chinese parallel corpora. J. Am. Soc. Inf. Sci. Technol., 54, 730742. Retrieved from https://aclanthology.org/A00-1004.pdf. DOI: 10.1002/asi.10261
  52. 52Zhai, Y., Liu, L., Zhong, X., Illouz, G., & Vilnat, A. (2020, May). Building an English-Chinese parallel corpus annotated with sub-sentential translation techniques. In Proceedings of the 12th language resources and evaluation conference (pp. 40244033). Marseille, France: European Language Resources Association. Retrieved from https://www.aclweb.org/anthology/2020.lrec-1.496
  53. 53Zhao, B., & Vogel, S. (2002). Adaptive parallel sentences mining from web bilingual news collection. In zz (Ed.), Proceedings of the IEEE international conference on data mining (pp. 745748). Beijing. DOI: 10.1109/ICDM.2002.1184044
DOI: https://doi.org/10.5334/johd.62 | Journal eISSN: 2059-481X
Language: English
Published on: Mar 18, 2022
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2022 Nicolas Turenne, Ziwei Chen, Guitao Fan, Jianlong Li, Yiwen Li, Siyuan Wang, Jiaqi Zhou, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.