A Lexicon-Corpus-based Unsupervised Chinese Word Segmentation Approach

Lu Pengyu; Pu Jingchuan; Du Mingming; Lou Xiaojuan; Jin Lijun

doi:10.21307/ijssis-2017-655

.blurhash-client-img { display: none !important; }

A Lexicon-Corpus-based Unsupervised Chinese Word Segmentation Approach

International Journal on Smart Sensing and Intelligent Systems

Volume 7 (2014): Issue 1 (January 2014)

By: Lu Pengyu, Pu Jingchuan, Du Mingming, Lou Xiaojuan and Jin Lijun

Open Access

|Mar 2014

A M. Cretu and P. Payeur, “Visual Attention Model with Adaptive Weighting of Conspicuity Maps for Building Detection in Satellite Images” , International Journal on Smart Sensing and Intelligent Systems, Vol. 5, No. 4, pp. 742-766, December 2012.10.21307/ijssis-2017-505
Search in Google Scholar Back to article
Yong Xiao, et al., “ Feed-forward Control of Temperature-Induced Head Skew for Hard Disk Drives”, International Journal on Smart Sensing and Intelligent Systems, Vol. 5, No. 1, pp. 95-106, March 2012.10.21307/ijssis-2017-473
Search in Google Scholar Back to article
Peng FuChun, F.F. and Andrew Mccallum, “Chinese segmentation and new word detection using conditional random fields”, 20th International Conference On Computational Linguistics, No. 562, pp. 562-568, August 2004.10.3115/1220355.1220436
Search in Google Scholar Back to article
Sproat Richard, et al., “A stochastic finite-state word-segmentation algorithm for Chinese”, Computational Linguistics, Vol. 22, No. 3, pp. 377-404, September 1996.
Search in Google Scholar Back to article
Xi Luo, et al., “Impact of Word Segmentation Errors on Automatic Chinese Text Classification”, 10th IAPR International Workshop on Document Analysis Systems, pp. 271-275, March 2012.10.1109/DAS.2012.43
Search in Google Scholar Back to article
Zhao Hai and Chunyu Kit, “Integrating unsupervised and supervised word segmentation: The role of goodness measures”, Information Sciences, Vol.181, Issue.1, pp. 163-183, January 2011.10.1016/j.ins.2010.09.008
Search in Google Scholar Back to article
Chen Keh-Jiann and Liu Shing-Huan, “Word identification for Mandarin Chinese sentences”, 14th conference on Computational linguistics, Vol. 1, pp. 101–107, August 1992.10.3115/992066.992085
Search in Google Scholar Back to article
Chen Wenyu, et al., “A Pragmatic Approach to Increase Accuracy of Chinese Word-Segmentation”, International Forum On Information Technology And Applications, Vol. 1, pp. 389-391, July 2010. (DOI= http://dx.doi.org/10.1109/IFITA.2010.262).
Search in Google Scholar Back to article
Hong ChinMing, Chen ChihMing and Chiu Chao-Yang, “Automatic extraction of new words based on Google News corpora for supporting lexicon-based Chinese word segmentation systems”, Expert Systems With Applications, Vol. 36, No. 2, pp. 3641-3651, March 2009.(DOI= http://dx.doi.org/10.1016/j.eswa.2008.02.013).
Search in Google Scholar Back to article
Chen Keh-Jiann and Bai Ming-Hong, “Unknown word detection for Chinese by a corpus-based learning method”, International Journal Of Computational Linguistics And Chinese Language Processing, Vol. 3, No. 1, pp. 27-44, February 1998.
Search in Google Scholar Back to article
Chen Keh-Jiann and Ma Wei-Yun, “Unknown word extraction for Chinese documents”, 19th International Conference on Computational linguistics, Vol. 1, pp. 1-7, August 2002.10.3115/1072228.1072277
Search in Google Scholar Back to article
Lin Yih-Jeng and Yu Ming-Shing, “Extracting Chinese frequent strings without a dictionary from a Chinese corpus and its applications”, Journal Of Information Science And Engineering, Vol. 17, issue. 5, pp. 805-824, September 2001.
Search in Google Scholar Back to article
Ma Wei-Yun and Chen Keh-Jiann, “A bottom-up merging algorithm for Chinese unknown word extraction”, Second SIGHAN Workshop On Chinese Language Processing, Vol. 17, pp. 31-38,July 2003.10.3115/1119250.1119255
Search in Google Scholar Back to article
He Shan and Zhu Jie, “A bootstrap method for Chinese new words extraction”, IEEE International Conference, Vol.1, pp. 581-584, May 2001.
Search in Google Scholar Back to article
Lam Wai, Pik-Shan Cheung and Ruizhang Huang., “Mining events and new name translations from online daily news”, Joint ACM/IEEE Conference On Digital Libraries, pp. 287-295, June 2004.10.1145/996350.996418
Search in Google Scholar Back to article
Huang Cangning, Zhao Hai, “Chinese word segmentation: A decade review”, Journal of Chinese Information Processing, Vol.21, No.31, pp. 8–20, May 2007.
Search in Google Scholar Back to article
Islam, et al., “A Generalized Approach to Word Segmentation Using Maximum Length Descending Frequency and Entropy Rate”, Computational Linguistics and Intelligent Text Processing, pp. 175-185, February 2007.10.1007/978-3-540-70939-8_16
Search in Google Scholar Back to article
Lin Shian-Hua, et al, “Extracting classification knowledge of internet documents with mining term associations: A semantic approach”, International ACM SIGIR Conference On Research And Development In Information Retrieval, pp. 241-249, July 1998.10.1145/290941.291001
Search in Google Scholar Back to article
Lu Pengyu, Jin Lijun and Jiang Bin, “The Research of the Maximum Length n-grams Priority Chinese Word Segmentation Method Based on Corpus Type Frequency Information”, Proceedings Of The National Conference On Information Technology And Computer Science, pp. 71-74 , November 2012.10.2991/citcs.2012.111
Search in Google Scholar Back to article
Lu WenHsiang, Lee-Feng Chien and Hsi-Jian Lee, “Translation of web queries using anchor text mining”, ACM Transactions On Asian Language Information Processing , Vol. 1, issue. 2, pp. 159-172, March 2002.10.1145/568954.568958
Search in Google Scholar Back to article
Wu Dekai and Pascale Fung, “Improving Chinese tokenization with linguistic filters on statistical lexical acquisition”, Fourth Conference On Applied Natural Language Processing, Stuttgart, pp. 180-181 , October 1994.10.3115/974358.974399
Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.21307/ijssis-2017-655 | Journal eISSN: 1178-5608

Journal RSS Feed

Language: English

Page range: 263 - 282

Published on: Mar 1, 2014

Published by: Macquarie University, Australia

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

Chinese word segmentation,

lexicon-based,

Corpus-based,

word frequency,

natural language processing

Related subjects:

Engineering,

Introductions and overviews,

Engineering, other

© 2014 Lu Pengyu, Pu Jingchuan, Du Mingming, Lou Xiaojuan, Jin Lijun, published by Macquarie University, Australia
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 7 (2014): Issue 1 (January 2014)