Have a personal or library account? Click to login
The Challenges of Data Quality and Data Quality Assessment in the Big Data Era Cover

The Challenges of Data Quality and Data Quality Assessment in the Big Data Era

By: Li Cai and  Yangyong Zhu  
Open Access
|May 2015

Figures & Tables

figures/Fig01_web.png
Figure 1

Data quality framework.

figures/Fig02_web.png
Figure 2

A universal, two-layer big data quality standard for assessment.

Table 1

The hierarchical big data quality assessment framework (partial content).

DimensionsElementsIndicators
1) Availability1) AccessibilityWhether a data access interface is provided
Data can be easily made public or easy to purchase
2) TimelinessWithin a given time, whether the data arrive on time
Whether data are regularly updated
Whether the time interval from data collection and processing to release meets requirements
2) Usability1) CredibilityData come from specialized organizations of a country, field, or industry
Experts or specialists regularly audit and check the correctness of the data content
Data exist in the range of known or acceptable values
3) Reliability1) AccuracyData provided are accurate
Data representation (or value) well reflects the true state of the source information
Information (data) representation will not cause ambiguity
2) ConsistencyAfter data have been processed, their concepts, value domains, and formats still match as before processing
During a certain time, data remain consistent and verifiable
Data and the data from other data sources are consistent or verifiable
3) IntegrityData format is clear and meets the criteria
Data are consistent with structural integrity
Data are consistent with content integrity
4) CompletenessWhether the deficiency of a component will impact use of the data for data with multi-components
Whether the deficiency of a component will impact data accuracy and integrity
4) Relevance1) FitnessThe data collected do not completely match the theme, but they expound one aspect
Most datasets retrieved are within the retrieval theme users need
Information theme provides matches with users’ retrieval theme
5) Presentation Quality1) ReadabilityData (content, format, etc.) are clear and understandable
It is easy to judge that the data provided meet needs
Data description, classification, and coding content satisfy specification and are easy to understand
figures/Fig03_web.png
Figure 3

Quality assessment process for big data.

Language: English
Published on: May 22, 2015
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2015 Li Cai, Yangyong Zhu, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.