Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Examples of TCGA cancer-type concepts_
| Concept ID | Name | TCGA defined terms [abbr] – [full name] | Synonyms | DO mapping |
|---|---|---|---|---|
| D0001 | Glioblastoma | GBM – Glioblastoma Multiforme | Glioblastoma, GBM, adult glioblastoma multiforme, primary glioblastoma multiforme, spongioblastoma multiforme | DOID: 3068 |
| D0002 | Breast cancer | BRCA – Breast Invasive Carcinoma | Breast cancer, breast tumor, breast neoplasm, mammary cancer, mammary tumor, mammary neoplasm, malignant tumor of breast, | DOID: 1612 |
| D0003 | Ovarian cancer | OV – Ovarian Serous Cystadenocarcinoma | Ovarian cancer, ovarian tumor, ovarian neoplasm, ovary cancer, ovary tumor, ovary neoplasm, malignant tumor of ovary | DOID: 2394 |
| D0004 | Acute myeloid leukemia | LAML – Acute Myeloid Leukemia | Acute myeloid leukemia, AML, acute myeloblastic leukemia, acute myelogenous leukemia | DOID: 9119 |
Distribution of TCGA key terms in full-text articles_
| Feature | Retrieved PMC article set (%) | Benchmark dataset (%) | |
|---|---|---|---|
| TCGA term positon | Title | 1 | 4 |
| Abstract | 11 | 28 | |
| Introduction/Background | 12 | 20 | |
| Method/Material | 31 | 68 | |
| Result | 74 | 96 | |
| Discussion/Conclusion | 20 | 36 | |
| TCGA related concept | Cancer type mention | 73 | 100 |
| mention | Platform mention | 66 | 96 |
Examples of TCGA high-throughput platform concepts_
| Concept ID | Name | TCGA-defined terms | Generated data |
|---|---|---|---|
| P0001 | RNASeq | IlluminaGA_RNASeq, | Nucleotide sequence, gene expression |
| IlluminaHiSeq_RNASeq | |||
| P0002 | miRNASeq | IlluminaGA_miRNASeq | miRNAs, microRNA, microRNA sequence |
| P0003 | SNP | Genome_Wide_SNP | SNPs, single nucleotide polymorphisms, CNV, copy number variation |
| P0004 | Methylation | Human methylation | DNA methylation |