Have a personal or library account? Click to login
Detecting Family Resemblance: Automated Genre Classification Cover

Detecting Family Resemblance: Automated Genre Classification

By: Yunhyong Kim and  Seamus Ross  
Open Access
|Mar 2007

Abstract

This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features, and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.
DOI: https://doi.org/10.2481/dsj.6.S172 | Journal eISSN: 1683-1470
Language: English
Published on: Mar 28, 2007
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2007 Yunhyong Kim, Seamus Ross, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.