Have a personal or library account? Click to login
Efficient n-gram, Skipgram and Flexgram Modelling with Colibri Core Cover

Efficient n-gram, Skipgram and Flexgram Modelling with Colibri Core

Open Access
|Aug 2016

Abstract

Counting n-grams lies at the core of any frequentist corpus analysis and is often considered a trivial matter. Going beyond consecutive n-grams to patterns such as skipgrams and flexgrams increases the demand for efficient solutions. The need to operate on big corpus data does so even more. Lossless compression and non-trivial algorithms are needed to lower the memory demands, yet retain good speed. Colibri Core is software for the efficient computation and querying of n-grams, skipgrams and flexgrams from corpus data. The resulting pattern models can be analysed and compared in various ways. The software offers a programming library for C++ and Python, as well as command-line tools.
DOI: https://doi.org/10.5334/jors.105 | Journal eISSN: 2049-9647
Language: English
Submitted on: Nov 9, 2015
Accepted on: Jul 1, 2016
Published on: Aug 2, 2016
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2016 Maarten van Gompel, Antal van den Bosch, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.