Have a personal or library account? Click to login
The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents Cover

The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents

Open Access
|Sep 2017

References

  1. 1
    ABBYY ABBYY FineReader [computer software] 2017 https://www.abbyy.com/en-us/finereader/
  2. 2
    Adobe Systems Adobe Acrobat Reader [computer software] 2017 https://get.adobe.com/reader/
  3. 3
    Apache Software Foundation Apache Kafka [computer software] 2017 https://kafka.apache.org/
  4. 4
    Apache Software Foundation Apache Zookeeper [computer software] 2017 https://zookeeper.apache.org/
  5. 5
    Casties R Raspe M digilib [computer software] 2017 http://digilib.sourceforge.net/
  6. 6
    Glyph & Cog, LLC pdf to text; distributed as part of Xpdf [computer software] 2014 http://www.foolabs.com/xpdf/home.html
  7. 7
    Hockey S M Electronic texts in the Humanities 2000 Oxford University Press 11 23 10.1093/acprof:oso/9780198711940.003.0002 Chapter 2, Creating and Acquiring Electronic Texts
  8. 8
    Huculak J M Justice B Report for the University of Victoria Libraries on Fedora Commons-Based DAMS: Building Collaborative Scholarship Environments, A Test Case Available from: http://hdl.handle.net/1828/7212 [Accessed 16th December 2016]
  9. 9
    Oracle Corporation MySQL [computer software] 2017 https://www.mysql.com/
  10. 10
    Peirson E B Tutorial: Text Extraction and OCR with Tesseract and ImageMagick 2015 Available from: https://diging.atlassian.net/wiki/display/DCH/Tutorial%3A+Text+Extraction+and+OCR+with+Tesseract+and+ImageMagick [Accessed 15th December 2016]
  11. 11
    Princeton University Library Plum: A Hydra head to support digitization workflows Available from: https://github.com/pulibrary/plum [Accessed 16th December 2016]
  12. 12
    Schmidt B Tutorial: Command-line OCR on a Mac Available from: http://benschmidt.org/dighist13/?page_id=129 [Accessed 15th December 2016]
  13. 13
    Smith R Tesseract OCR [computer software] 2017 https://github.com/tesseract-ocr/tesseract
DOI: https://doi.org/10.5334/jors.164 | Journal eISSN: 2049-9647
Language: English
Submitted on: Feb 13, 2017
Accepted on: Sep 15, 2017
Published on: Sep 28, 2017
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2017 Julia Damerow, B. R. Erick Peirson, Manfred D. Laubichler, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.