Have a personal or library account? Click to login
Mining an English-Chinese parallel Dataset of Financial News Cover

Mining an English-Chinese parallel Dataset of Financial News

Open Access
|Mar 2022

Abstract

Parallel text datasets are a valuable for educational purposes, machine translation, and cross-language information retrieval, but few are domain-oriented. We have created a Chinese–English parallel dataset in the domain of finance technology, using the Financial Times website, from which we grabbed 60,473 news items from between 2007 and 2021. This dataset is a bilingual Chinese–English parallel dataset of news in the domain of finance. It is open access in its original state without transformation, and has been made not for machine translation as has been used, but for intelligent mining, in which we conducted many experiments using up-to-date text mining techniques: clustering (topic modeling, community detection, k-means), topic prediction (naive Bayes, SVM, LSTM, Bert), and pattern discovery (dictionary based, time series). We present the usage of these techniques as a framework for other studies, not only as an application but with an interpretation.

DOI: https://doi.org/10.5334/johd.62 | Journal eISSN: 2059-481X
Language: English
Published on: Mar 18, 2022
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2022 Nicolas Turenne, Ziwei Chen, Guitao Fan, Jianlong Li, Yiwen Li, Siyuan Wang, Jiaqi Zhou, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.