Have a personal or library account? Click to login

A Lexicon-Corpus-based Unsupervised Chinese Word Segmentation Approach

Open Access
|Mar 2014

References

  1. A M. Cretu and P. Payeur, “Visual Attention Model with Adaptive Weighting of Conspicuity Maps for Building Detection in Satellite Images” , International Journal on Smart Sensing and Intelligent Systems, Vol. 5, No. 4, pp. 742-766, December 2012.10.21307/ijssis-2017-505
  2. Yong Xiao, et al., “ Feed-forward Control of Temperature-Induced Head Skew for Hard Disk Drives”, International Journal on Smart Sensing and Intelligent Systems, Vol. 5, No. 1, pp. 95-106, March 2012.10.21307/ijssis-2017-473
  3. Peng FuChun, F.F. and Andrew Mccallum, “Chinese segmentation and new word detection using conditional random fields”, 20th International Conference On Computational Linguistics, No. 562, pp. 562-568, August 2004.10.3115/1220355.1220436
  4. Sproat Richard, et al., “A stochastic finite-state word-segmentation algorithm for Chinese”, Computational Linguistics, Vol. 22, No. 3, pp. 377-404, September 1996.
  5. Xi Luo, et al., “Impact of Word Segmentation Errors on Automatic Chinese Text Classification”, 10th IAPR International Workshop on Document Analysis Systems, pp. 271-275, March 2012.10.1109/DAS.2012.43
  6. Zhao Hai and Chunyu Kit, “Integrating unsupervised and supervised word segmentation: The role of goodness measures”, Information Sciences, Vol.181, Issue.1, pp. 163-183, January 2011.10.1016/j.ins.2010.09.008
  7. Chen Keh-Jiann and Liu Shing-Huan, “Word identification for Mandarin Chinese sentences”, 14th conference on Computational linguistics, Vol. 1, pp. 101–107, August 1992.10.3115/992066.992085
  8. Chen Wenyu, et al., “A Pragmatic Approach to Increase Accuracy of Chinese Word-Segmentation”, International Forum On Information Technology And Applications, Vol. 1, pp. 389-391, July 2010. (DOI= http://dx.doi.org/10.1109/IFITA.2010.262).
  9. Hong ChinMing, Chen ChihMing and Chiu Chao-Yang, “Automatic extraction of new words based on Google News corpora for supporting lexicon-based Chinese word segmentation systems”, Expert Systems With Applications, Vol. 36, No. 2, pp. 3641-3651, March 2009.(DOI= http://dx.doi.org/10.1016/j.eswa.2008.02.013).
  10. Chen Keh-Jiann and Bai Ming-Hong, “Unknown word detection for Chinese by a corpus-based learning method”, International Journal Of Computational Linguistics And Chinese Language Processing, Vol. 3, No. 1, pp. 27-44, February 1998.
  11. Chen Keh-Jiann and Ma Wei-Yun, “Unknown word extraction for Chinese documents”, 19th International Conference on Computational linguistics, Vol. 1, pp. 1-7, August 2002.10.3115/1072228.1072277
  12. Lin Yih-Jeng and Yu Ming-Shing, “Extracting Chinese frequent strings without a dictionary from a Chinese corpus and its applications”, Journal Of Information Science And Engineering, Vol. 17, issue. 5, pp. 805-824, September 2001.
  13. Ma Wei-Yun and Chen Keh-Jiann, “A bottom-up merging algorithm for Chinese unknown word extraction”, Second SIGHAN Workshop On Chinese Language Processing, Vol. 17, pp. 31-38,July 2003.10.3115/1119250.1119255
  14. He Shan and Zhu Jie, “A bootstrap method for Chinese new words extraction”, IEEE International Conference, Vol.1, pp. 581-584, May 2001.
  15. Lam Wai, Pik-Shan Cheung and Ruizhang Huang., “Mining events and new name translations from online daily news”, Joint ACM/IEEE Conference On Digital Libraries, pp. 287-295, June 2004.10.1145/996350.996418
  16. Huang Cangning, Zhao Hai, “Chinese word segmentation: A decade review”, Journal of Chinese Information Processing, Vol.21, No.31, pp. 8–20, May 2007.
  17. Islam, et al., “A Generalized Approach to Word Segmentation Using Maximum Length Descending Frequency and Entropy Rate”, Computational Linguistics and Intelligent Text Processing, pp. 175-185, February 2007.10.1007/978-3-540-70939-8_16
  18. Lin Shian-Hua, et al, “Extracting classification knowledge of internet documents with mining term associations: A semantic approach”, International ACM SIGIR Conference On Research And Development In Information Retrieval, pp. 241-249, July 1998.10.1145/290941.291001
  19. Lu Pengyu, Jin Lijun and Jiang Bin, “The Research of the Maximum Length n-grams Priority Chinese Word Segmentation Method Based on Corpus Type Frequency Information”, Proceedings Of The National Conference On Information Technology And Computer Science, pp. 71-74 , November 2012.10.2991/citcs.2012.111
  20. Lu WenHsiang, Lee-Feng Chien and Hsi-Jian Lee, “Translation of web queries using anchor text mining”, ACM Transactions On Asian Language Information Processing , Vol. 1, issue. 2, pp. 159-172, March 2002.10.1145/568954.568958
  21. Wu Dekai and Pascale Fung, “Improving Chinese tokenization with linguistic filters on statistical lexical acquisition”, Fourth Conference On Applied Natural Language Processing, Stuttgart, pp. 180-181 , October 1994.10.3115/974358.974399
Language: English
Page range: 263 - 282
Published on: Mar 1, 2014
Published by: Professor Subhas Chandra Mukhopadhyay
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2014 Lu Pengyu, Pu Jingchuan, Du Mingming, Lou Xiaojuan, Jin Lijun, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.