Table 1
Summary of the 10 Sunlight Foundation’s principles for data quality.
| DIMENSION | DESCRIPTION |
|---|---|
| Completeness | Resources published on open data platforms should contain all raw information and metadata defining and explaining their content. |
| Primacy | Resources published on open data platforms should also include the original information released by the government. |
| Timeliness | Resources should be available to the public in a timely manner. |
| Easy access | Resources published on open data platforms should be easy to find and download. |
| Machine-readable format | Resources should be stored in a machine-readable format (i.e., should be processable by a computer). |
| Non-discrimination | Resources published on open data platforms should be accessible without having to identify oneself (e.g., without the need to log in) or having to provide a justificatory reason. |
| Open format | Resources should be usable without proprietary software. |
| Open licensing | Resources published on open data platforms should use an open licensing model. |
| Permanence | Resources published on open data platforms should be accessible by machines and humans over time. |
| Usage cost | Resources should be available for free. |
[i] Source: Marmier and Mettler (2020).
Table 2
Questions and their chaining logic.
| QUALITY PRINCIPLE | QUESTION | CHAINING LOGIC |
|---|---|---|
| Completeness | Q1: Is the metadata complete? | If the raw information and the metadata of this resource exist = 1, else 0 |
| Primacy | Q2: Is there an email address for a contact point/support contact? | If an e-mail address to contact the originator exists = 1, else 0 |
| Timeliness | Q3: Is the resource up to date? | If the dataset was last updated in time to comply with the declared update frequency = 1, else 0 |
| Easy access | Q4: Is the data available in bulk? | If a link to download the data exists, and the license for data reuse is fully open = 1, else 0 |
| Machine-readable format | Q5: Is the resource available in machine-readable format? | If the format used is machine-readable = 1, else 0 |
| Non-discrimination | Q6: Do people have limited access to the resource? | If a link to download the data exists, and the license for data reuse is fully open, and the data is machine-readable = 1, else 0 |
| Commonly owned or open standards | Q7: Is the resource in an open file format? | If the data format is open = 1, else 0 |
| Transparent licensing | Q8: Is the licensing information about the resource transparent? | If licensing information for data reuse is available = 1, else 0 |
| Permanence | Q9: Is the published resource available over time? | If a link to download the data exists, and it is different from the data access link = 1, else 0 |
| Usage cost | Q10: Is the resource freely available? | If the data format is open and the license for data reuse is fully open = 1, else 0 |
[i] Source: Adapted from Marmier and Mettler (2020).

Figure 1
Average compliance index and standard deviation per country and per dataset type (HVD and non-HVD).

Figure 2
Violin plot of the compliance index level per country and per dataset type (HVD and non-HVD).

Figure 3
Composition of the average compliance index score by quality principle per country and per dataset type (HVD and non-HVD).
Table 3
Measures of issuers’ distribution concentration per country and per dataset type (HVD and non-HVD).
| COUNTRY | DATASET TYPE | NUMBER OF ISSUERS | TOP ISSUER SHARE (%) | TOP 3 ISSUERS SHARE (%) | ENTROPY SCORE OF ISSUERS’ DISTRIBUTION |
|---|---|---|---|---|---|
| AT | HVD | 16 | 18.7 | 48.9 | 3.39 |
| AT | Non-HVD | 146 | 78.2 | 90.5 | 1.44 |
| DE | HVD | 188 | 6.5 | 15.2 | 6.04 |
| DE | Non-HVD | 1658 | 12.1 | 29.2 | 6.43 |
| IE | HVD | 16 | 92.5 | 98.3 | 0.55 |
| IE | Non-HVD | 120 | 58.4 | 70.2 | 2.94 |
| IT | HVD | 3 | 72.8 | 100.0 | 1.07 |
| IT | Non-HVD | 230 | 16.6 | 39.5 | 4.49 |
| NL | HVD | 1 | 100.0 | 100.0 | 0.0 |
| NL | Non-HVD | 24 | 37.7 | 74.3 | 2.65 |
| CH | Non-HVD | 128 | 22.6 | 32.7 | 4.94 |
[i] Note: The Entropy Score is calculated using Shannon entropy in bits. It is a measure of dispersion, with higher values designating a more dispersed distribution.
