Finding sequential patterns with TF-IDF metrics in health-care databases

Kardkovács, Zsolt T.; Kovács, Gábor

Finding sequential patterns with TF-IDF metrics in health-care databases

Acta Universitatis Sapientiae, Informatica

Volume 6 (2014): Issue 2 (December 2014)

By:

Zsolt T. Kardkovács and Gábor Kovács

Open Access

|Jan 2015

Abstract

Finding frequent sequential patterns has been defined as finding ordered list of items that occur more times in a database than a user defined threshold. For big and dense databases that contain really long sequences and large itemset such as medical case histories, algorithm proposed on this idea of counting the occurrences output enourmous number of highly redundant frequent sequences, and are therefore simply impractical. Therefore, there is a need for algorithm that perform frequent pattern search and prefiltering simultaneously. In this paper, we propose an algorithm that reinterprets the term support on text mining basis. Experiments show that our method not only eliminates redundancy among the output sequences, but it scales much better with huge input data sizes. We apply our algorithm for mining medical databases: what diagnoses are likely to lead to a certain future health condition.

DOI: https://doi.org/10.1515/ausi-2015-0008 | Journal eISSN: 2066-7760

Journal RSS Feed

Language: English

Page range: 287 - 310

Submitted on: Sep 11, 2014

Published on: Jan 27, 2015

Published by: Sapientia Hungarian University of Transylvania

In partnership with: Paradigm Publishing Services

Publication frequency: 2 times per year

Keywords:

sequence mining,

frequent sequential pattern,

TF-IDF,

health care database

Related subjects:

Computer sciences,

Computer sciences, other

© 2015 Zsolt T. Kardkovács, Gábor Kovács, published by Sapientia Hungarian University of Transylvania
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Previous article Volume 6 (2014): Issue 2 (December 2014)