
Figure 1
Inclusion and exclusion flowchart for training dataset 1.

Figure 2
Inclusion and exclusion flowchart for training dataset 2.

Figure 3
Inclusion and exclusion flowchart for the validation dataset.

Figure 4
Schema of the algorithm workflow and architecture.
Table 1
Different keyword categories used by ODDPub to detect Open Data and Open Code.
| Combined Keyword Categories | Explanation |
|---|---|
| Open Data | |
| Field-specific repositories | Checks if data deposition in field-specific database and accession number is mentioned |
| General-purpose repositories | Checks if data deposition in general-purpose database that uses no accession number is mentioned |
| Dataset | Checks if dataset that is given specific number (e.g. “dataset S1”) is mentioned |
| Supplemental table or data | Checks if a numbered file or table or raw data is mentioned together with specific file formats |
| Supplementary raw/full data with specific file format | Checks if raw/full data is mentioned together with specific file formats |
| Data availability statement | Checks if an accession number or a repository name is mentioned in the data availability section |
| Dataset on GitHub | Checks if data deposition on GitHub is mentioned |
| Data journals | Checks journal DOI for certain known data journals |
| Open Code | |
| Source-code availability | Checks if availability of source code is mentioned |
| Supplementary Source-code | Checks if source code in the supplement is mentioned |
Table 2
Predictions of ODDPub for Open Data on the validation dataset in comparison to the manual screening.
| Open Data | ODDPub | ||
|---|---|---|---|
| Yes | No | ||
| Human rater | Yes | 67 | 24 |
| No | 23 | 678 | |
Table 3
Predictions of ODDPub for Open Code on the validation dataset in comparison to the manual screening.
| Open Data | ODDPub | ||
|---|---|---|---|
| Yes | No | ||
| Human rater | Yes | 8 | 3 |
| No | 6 | 775 | |

Figure 5
Venn diagram of the overlap between the detected Open Data publications for the four different detection methods on the validation dataset: Manual search, ODDPub, PubMed, and Web of Science. The manual search represents the gold standard. All but one of the 23 publications detected by ODDPub but not the manual search are false positive detections.
Table 4
Types of data sharing observed in the manually detected Open Data publications of the validation sample.
| Category | Number of occurrences |
|---|---|
| Supplemental Data | 42 |
| Field-specific repository | 40 |
| General-purpose repository (including GitHub) | 14 |
| Institutional repository | 0 |
| Personal/project-specific website | 1 |
| Data journal | 0 |
Table 5
Reasons for false positive cases detected by ODDPub in the validation sample. For three cases two conditions applied, as part of the shared data was not raw data and the other pared was shared with restrictions.
| Category | Number of occurrences |
|---|---|
| Shared data not raw data | 9 |
| Data sharing with restrictions | 4 |
| Open Data reuse | 4 |
| Only analysis code shared | 3 |
| Detected sentence not related to data sharing | 3 |
| Data available upon request | 1 |
| Linked OSF repository was empty | 1 |
| Case of Open Data missed by manual search | 1 |
