Have a personal or library account? Click to login
Google Books Ngrams Recompressed and Searchable Cover
Open Access
|Dec 2012

References

  1. [1] Brants T., Popat A. C., Xu P., Och F. J., Dean J., Large language models in machine translation, in: Proceedings of the 2007 Joint Conference on Empirical Methods inNatural Language Processing and Computational Natural Language Learning, Prague, ACL 2007, 858-867.
  2. [2] Gao J., Nguyen P., Li X., Thrasher C., Li M., Wang K., A Comparative Study of Bing Web N-gram Language Models for Web Search and Natural Language Processing, in: Workshop of the 33rd Annual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval, Geneva 2010.
  3. [3] Grabowski Sz., Swacha J., Compact Representation of URL Collections with Fast Access, Automatyka, 15, 3, 2011, 349-355.
  4. [4] Guthrie D., Hepple M., Liu W., Efficient Minimal Perfect Hash Language Models, in: N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, D. Tapias (eds.), Proceeding of the Seventh International Conference on LanguageResources and Evaluation, Valetta, ELRA 2010.
  5. [5] Michel J.-B. B., Kui Y., Presser A., Veres A., Gray M. K., Google Books Team, Picket J. P., Hoiberg D., Clancy D., Norvig P., Orwant J., Pinker S., Nowak M. A., Lieberman Aider E., Quantitative Analysis of Culture Using Millions of Digitized Books, Science, 331, 6014, 2011, 176-182.10.1126/science.1199644327974221163965
  6. [6] Microsoft Research, Spelling Alteration for Web Search Workshop, City Center - Bellevue, WA, July 19, 2011. Materials available at http://webngram. research.microsoft.com/Spellerchallenge/Docs/Spelling_Alteration_Workshop. pdf (last checked: June 2012).
  7. [7] Pauls A., Klein D., Faster and Smaller N-Gram Language Models, in: Y. Matsumoto, R. Mihalcea (eds.), Proceedings of the 49th Annual Meeting of the Association forComputational Linguistics: Human Language Technologies - Volume 1, Stroudsburg, ACL 2011, 258-267.
  8. [8] Procházka V., Pollák P., Analysis of Czech Web 1T 5-Gram Corpus and Its Comparison with Czech National Corpus Data, in: P. Sojka, A. Horák, I. Kopecek, K. Pala (eds.), Proceedings of the 13th International Conference Text, Speech andDialog, Brno, Springer 2010, 181-188.10.1007/978-3-642-15760-8_24
  9. [9] Skibiński P., Grabowski Sz., Swacha J., Effective asymmetric XML compression, Software-Practice and Experience, 38, 10, 2008, 1027-1047.10.1002/spe.859
  10. [10] Talbot D., Brants T., Randomized Language Models via Perfect Hash Functions, in: Proceedings of the 46th Annual Meeting of the Association for ComputationalLinguistics: Human Language Technologies, Columbus, ACL 2008, 505-513.
  11. [11] Witten I. H., Moffat A., Bell T. C., Managing Gigabytes: Compressing and IndexingDocuments and Images, Morgan Kaufmann Publishers, Los Altos, 1999.
  12. [12] Ziv, J., Lempel, A., A Universal Algorithm for Sequential Data Compression, IEEETransactions on Information Theory, 23, 3, 1977, 337-343.10.1109/TIT.1977.1055714
  13. [13] http://books.google.com/ngrams (last checked: June 2012).
  14. [14] http://books.google.com/ngrams/datasets (last checked: June 2012).
  15. [15] http://books.google.com/ngrams/info (last checked: June 2012).
  16. [16] http://iiwz.wneiz.pl/jakubs/progs/ngram_compressor.zip (last checked: June 2012).
  17. [17] http://research.microsoft.com/en-us/collaboration/focus/cs/web-ngram.aspx (last checked: June 2012).
  18. [18] http://www.base2ti.com (last checked: June 2012).
  19. [19] http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13 (last checked: June 2012).
  20. [20] http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2011T07 (last checked: June 2012).
DOI: https://doi.org/10.2478/v10209-011-0015-8 | Journal eISSN: 2300-3405 | Journal ISSN: 0867-6356
Language: English
Page range: 271 - 281
Published on: Dec 22, 2012
Published by: Poznan University of Technology
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2012 Szymon Grabowski, Jakub Swacha, published by Poznan University of Technology
This work is licensed under the Creative Commons License.