Have a personal or library account? Click to login
Research on Dictionary-Based Word Segmentation Algorithms Using Trie Structure Cover

Research on Dictionary-Based Word Segmentation Algorithms Using Trie Structure

By: Boxing Zhang,  Xin Jing and  Qinlong Kang  
Open Access
|Jun 2025

Abstract

This study investigates dictionary-based word segmentation algorithms, which are essential in Natural Language Processing (NLP). Chinese word segmentation poses significant challenges due to the lack of clear word delimiters in the language. This paper explores the advantages and limitations of dictionary-based segmentation algorithms, focusing on how data structures such as Trie and Double-Array Trie (DAT) can enhance segmentation efficiency. An analysis of Trie and DAT structures leads to an optimization achieving constant-time state transitions. This paper evaluates and compares various segmentation algorithms, including full segmentation, forward maximum matching, backward maximum matching, and bidirectional maximum matching. The inherent limitations of dictionary-based segmentation, particularly its dependence on dictionaries and poor disambiguation capability, are also discussed.

Language: English
Page range: 50 - 61
Published on: Jun 13, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Boxing Zhang, Xin Jing, Qinlong Kang, published by Xi’an Technological University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.