Have a personal or library account? Click to login
Principles of Synthesizing Medical Datasets Cover
Open Access
|Jan 2023

Abstract

Data in many application domains provide a valuable source for analysis and data-driven decision support. On the other hand, legislative restrictions are provided, especially on personal data and patients’ data in the medical domain. In order to maximize the use of data for decision purposes and comply with legislation, sensitive data needs to be properly anonymized or synthetized. This article contributes to the area of medical records synthesis. We first introduce this topic and present it in a broader context, as well as in terms of methods used and metrics for their evaluation. Based on the related work analysis, we selected CTGAN neural network model for data synthesis and experimentally validated it on three different medical datasets. The results were evaluated both quantitatively by means of selected metrics as well as qualitatively by means of proper visualization techniques. The results showed that in most cases, the synthesized dataset is a very good approximation of the original one, with similar prediction performance.

DOI: https://doi.org/10.2478/aei-2022-0019 | Journal eISSN: 1338-3957 | Journal ISSN: 1335-8243
Language: English
Page range: 25 - 29
Submitted on: Aug 3, 2022
Accepted on: Oct 21, 2022
Published on: Jan 24, 2023
Published by: Technical University of Košice
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2023 Michal Kolárik, Lucia Gojdičová, Ján Paralič, published by Technical University of Košice
This work is licensed under the Creative Commons Attribution 4.0 License.