Have a personal or library account? Click to login
Which Data Format To Store Scientific Data Should I Use? A Performance Analysis Cover

Which Data Format To Store Scientific Data Should I Use? A Performance Analysis

Open Access
|Oct 2022

Abstract

A lot of scientific work is dedicated to the analysis of data. Most of the analyzed data, like data from space missions, are structured. The choice of data format can affect various characteristics - read/write speed of standard files, read/write speed of small files and read/write speed of compressed data formats. In this paper, we analyze binary data formats, proposed types of the tests and testing methods, and compare their performance with human-readable text format. We also discuss compressed and uncompressed modes available for data formats like HDF5 and netCDF. When disregarding precision, the best data format from the size perspective is lossy HDF5 without compression. Losless HDF5 without compression show the best speed performance. Lossy HDF5 without compression is the best balance between size reduction and speed. However, for specific criteria and types of files, there might be better candidates as detailed in the conclusion.

DOI: https://doi.org/10.2478/aei-2022-0015 | Journal eISSN: 1338-3957 | Journal ISSN: 1335-8243
Language: English
Page range: 32 - 40
Submitted on: May 1, 2022
Accepted on: Jun 13, 2022
Published on: Oct 14, 2022
Published by: Technical University of Košice
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2022 Daniel Gecášek, Michal Solanik, Ján Genči, published by Technical University of Košice
This work is licensed under the Creative Commons Attribution 4.0 License.