Have a personal or library account? Click to login
Identifying Scientific Project-generated Data Citation from Full-text Articles: An Investigation of TCGA Data Citation Cover

Identifying Scientific Project-generated Data Citation from Full-text Articles: An Investigation of TCGA Data Citation

By: Jiao Li,  Si Zheng,  Hongyu Kang,  Zhen Hou and  Qing Qian  
Open Access
|Sep 2017

Figures & Tables

Figure 1

Computational workflow for identifying TCGA data usage.
Computational workflow for identifying TCGA data usage.

Figure 2

Number of TCGA-related publications in PMC.
Number of TCGA-related publications in PMC.

Figure 3

Geographical distribution of TCGA-related publications.
Geographical distribution of TCGA-related publications.

Figure 4

Distribution of TCGA cancer types.
Distribution of TCGA cancer types.

Figure 5

Distribution of the TCGA high-throughput platform.
Distribution of the TCGA high-throughput platform.

Figure 6

Manual link literature that includes TCGA data.
Manual link literature that includes TCGA data.

Examples of TCGA cancer-type concepts_

Concept IDNameTCGA defined terms [abbr] – [full name]SynonymsDO mapping
D0001GlioblastomaGBM – Glioblastoma MultiformeGlioblastoma, GBM, adult glioblastoma multiforme, primary glioblastoma multiforme, spongioblastoma multiformeDOID: 3068
D0002Breast cancerBRCA – Breast Invasive CarcinomaBreast cancer, breast tumor, breast neoplasm, mammary cancer, mammary tumor, mammary neoplasm, malignant tumor of breast,DOID: 1612
D0003Ovarian cancerOV – Ovarian Serous CystadenocarcinomaOvarian cancer, ovarian tumor, ovarian neoplasm, ovary cancer, ovary tumor, ovary neoplasm, malignant tumor of ovaryDOID: 2394
D0004Acute myeloid leukemiaLAML – Acute Myeloid LeukemiaAcute myeloid leukemia, AML, acute myeloblastic leukemia, acute myelogenous leukemiaDOID: 9119

Distribution of TCGA key terms in full-text articles_

FeatureRetrieved PMC article set (%)Benchmark dataset (%)
TCGA term positonTitle14
Abstract1128
Introduction/Background1220
Method/Material3168
Result7496
Discussion/Conclusion2036
TCGA related conceptCancer type mention73100
mentionPlatform mention6696

Examples of TCGA high-throughput platform concepts_

Concept IDNameTCGA-defined termsGenerated data
P0001RNASeqIlluminaGA_RNASeq,Nucleotide sequence, gene expression
IlluminaHiSeq_RNASeq
P0002miRNASeqIlluminaGA_miRNASeqmiRNAs, microRNA, microRNA sequence
P0003SNPGenome_Wide_SNPSNPs, single nucleotide polymorphisms, CNV, copy number variation
P0004MethylationHuman methylationDNA methylation
DOI: https://doi.org/10.20309/jdis.201612 | Journal eISSN: 2543-683X | Journal ISSN: 2096-157X
Language: English
Page range: 32 - 44
Submitted on: Jan 20, 2016
Accepted on: May 15, 2016
Published on: Sep 1, 2017
Published by: Chinese Academy of Sciences, National Science Library
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2017 Jiao Li, Si Zheng, Hongyu Kang, Zhen Hou, Qing Qian, published by Chinese Academy of Sciences, National Science Library
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.