
PicAxe: Extracting Figures from Structurally and Syntactically Heterogeneous Corpora of PDF Files
Anna C. Guerrero, Krishna Kamath, Qilin Zhou, Bruno Felalaga, Julia Damerow, Aaron R. Dinner

The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents
Julia Damerow, B. R. Erick Peirson, Manfred D. Laubichler