Have a personal or library account? Click to login
Subpopulation Discovery in Epidemiological Data with Subspace Clustering Cover

Subpopulation Discovery in Epidemiological Data with Subspace Clustering

Open Access
|Dec 2014

Abstract

A prerequisite of personalized medicine is the identification of groups of people who share specific risk factors towards an outcome. We investigate the potential of subspace clustering for finding such groups in epidemiological data. We propose a workflow that encompasses clusterability assessment before cluster discovery and quality assessment after learning the clusters. Epidemiological usually do not have a ground truth for the verification of clusters found in subspaces. Hence, we introduce quality assessment through juxtaposition of the learned models to “models-of-randomness”, i.e. models that do not reflect a true cluster structure. On the basis of this workflow, we select subspace clustering methods, compare and discuss their performance. We use a dataset with hepatic steatosis as outcome, but our findings apply on arbitrary epidemiological cohort data that have tenths of variables and exhibit class skew.

DOI: https://doi.org/10.2478/fcds-2014-0015 | Journal eISSN: 2300-3405 | Journal ISSN: 0867-6356
Language: English
Page range: 271 - 300
Submitted on: Aug 1, 2014
Published on: Dec 20, 2014
Published by: Poznan University of Technology
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2014 Uli Niemann, Myra Spiliopoulou, Henry Völzke, Jens-Peter Kühn, published by Poznan University of Technology
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.