The Integration of the Japan Link Center&rsquo;s Bibliographic Data into OpenCitations: The production of bibliographic and citation data structured according to the OpenCitations Data Model, originating from an Anglo-Japanese dataset

Arianna Moretti; Marta Soricetti; Ivan Heibi; Arcangelo Massari; Silvio Peroni; Elia Rizzetto

doi:10.5334/johd.178

Figures & Tables

Workflow for the ingestion of citation data and bibliographic metadata into the OpenCitations datasets.

Flowchart describing the preliminary processing of citing bibliographic entities.

Flowchart describing the processing of cited bibliographic entities, their validation, and the production of metadata and citation tables.

Table 1

Sample of Meta input tables produced by oc_ds_converter, storing bibliographic entities’ metadata.

ID	TITLE	AUTHOR	PUB_DATE	VENUE	VOLUME	ISSUE	PAGE	TYPE	PUBLISHER	EDITOR
DOI: 10.14825/kaseki.68.0_14	本邦産白亜紀アンモナイトデータベースおよび種多様性について	利光, 誠一; 平野, 弘道; 松本, 崇; 高橋, 一晴	2000	化石 [issn:0022-9202 issn:2424-2632 jid:kaseki]	68	0	14–16	journal article	日本古生物学会
DOI: 10.1126/science.235.4793.1156	Chronology of fluctuating sea levels since the Triassic		1987	Science	235		1156–1167

Table 2

Sample of Index input tables, produced by oc_ds_converter, storing citation data.

CITING	CITED
DOI: 10.14825/kaseki.68.0_14	DOI: 10.1126/science.235.4793.1156

Language distribution in Meta bibliographic entities, calculated on Meta dump, version 5 (https://doi.org/10.6084/m9.figshare.21747461.v5). The analysis was performed on bibliographic entities with a declared title.

Bar charts illustrating the analysis of multilingualism within the input dataset, categorized by bibliographic metadata fields.

Table 3

Table showing the metadata languages in the original dataset and the linguistic information loss due to OCDM constraints. The total amount of metadata provided for a field is the sum of the number of values provided solely in one language, twice the number of values supplied in two languages, and the product between the number of values provided in more than two languages and the precise number of furnished languages. The information loss is calculated as the sum of values provided in more languages out of the total calculated. The publisher’s name field has not been included in the table since it does not necessarily concern the loss of linguistic information but might involve cases where the information loss derives from having multi-publisher values.

	1 LANGUAGE	2 LANGUAGES	3+ LANGUAGES	TOTAL VALUES PROVIDED	INFORMATION LOSS WRT. THE ORIGINAL DATASET
title citing	5,701,285	1,641,895	39(3 languages)	8,985,192	1,641,973; 18.27%
title cited	217,316	12,616	0	242,548	12,616; 5.2%
authors citing	9,892,522	4,556,812	39(3 languages)	19,006,263	4,556,890; 23,98%
authors cited	308,079	157,556	0	623,191	157,556; 25.28%
journal title citing	1,137,368	2,658,678	21,213 (20,572 3 languages; 641 4 languages)	6,519,004	2,701,745; 41.44%
journal title cited	180,515	0	0	180,515	0

Language distribution in Meta bibliographic entities, calculated on Meta dump, version 6 (https://doi.org/10.6084/m9.figshare.21747461.v6). The analysis was performed on bibliographic entities with a declared title.

Table 4

Yellow cells represent the single contribution of each collection to OpenCitations Index, i.e., the number of citations uniquely derived by a given source. Pink cells represent the number of citations in the sources’ intersection. The table is based on OpenCitations data at its latest update (29 November 2023).

	INDEX	CROSSREF	DATACITE	PUBMED	OPENAIRE	JALC
INDEX	1,975,552,846	1,563,218,160	169,814,412	695,988,810	14,645,838	396,788
Crossref		1,100,963,346	27,051	458,309,297	3,917,329	1,137
DataCite			169,663,255	9,623	114,483	0
PubMed				237,208,867	9,711,789	125
OpenAire					1,067,712	0
JaLC						395,526

The Integration of the Japan Link Center’s Bibliographic Data into OpenCitations: The production of bibliographic and citation data structured according to the OpenCitations Data Model, originating from an Anglo-Japanese dataset

Figures & Tables

Figure 1

Figure 2

Figure 3

Table 1

Table 2

Figure 4

Figure 5

Table 3

Figure 6

Table 4

Paradigm

My account