Have a personal or library account? Click to login
A Survey on Publicly Available Open Datasets Derived From Electronic Health Records (EHRs) of Patients with Neuroblastoma Cover

A Survey on Publicly Available Open Datasets Derived From Electronic Health Records (EHRs) of Patients with Neuroblastoma

Open Access
|Oct 2022

Figures & Tables

Table 1

Quantitative characteristics of the analyzed datasets. #patients: number of patients. #features: number of clinical features. #missing values: number of missing data instances.

DATASET NAMEREFERENCE#PATIENTS#FEATURES#MISSING VALUESTABLE
dataBB2013Banelli et al. (2013)1211150Table 2
dataCK2018Kim et al. (2018)201629Table 3
dataEV2013Villamón et al. (2013)191113Table 4
dataYBC2019Choi et al. (2019)7100Table 5
dataYM2018Ma et al. (2018)1691352Table 6
interval[7, 169][10, 16][0, 52]
mean67.212.228.8
dsj-21-1450-g1.png
Figure 1

Presence and absence of the clinical features in the five analyzed datasets. x axis: datasets names. y axis: clinical features of the datasets. dataBB2013: Banelli et al. (2013) (Table 2). dataCK2018: Kim et al. (2018) (Table 3). dataEV2013: Villamón et al. (2013) (Table 4). dataYBC2019: Choi et al. (2019) (Table 5). dataYM2018: Ma et al. (2018) (Table 6).

Table 2

Meaning and values of the features of the dataBB2013 dataset. Number of patients: 121. Number of features: 11. AICR: alive in complete remission. AWD: alive with disease. AWSD: alive with stable disease. CR: complete remission. DOD: Dead of disease. GNB: ganglioneuroblastoma. HR: high risk. INRG: International Neuroblastoma Risk Group MYCN: MYCN oncogene. NaN: not a number. NB: neuroblastoma. NS: not specified, it was impossible to make a more precise diagnosis (Romani, 2021). OS: overall survival. PCDDHB: Protocadherin Beta Cluster. PFS: progression-free survival. SFN: Stratifin gene. Additional information can be found in the dataset original article by Banelli et al. (2013).

DATABB2013
FEATUREMEANINGTYPEVALUES
age at diagnosisage at diagnosisinteger2, …, 196
ferritinferritin serum levelng/ml–99, 19, …, 2250
histological categoryhistological category of the neuroblastomacategoricalNS, NB, GNB
INRG Risk classificationrisk group: HR high risk and I/LR intermediate/low riskcategorical0, 1
INSS Stagestage of the tumor (only stage 4 patients)categorical1
MYCN amplificationstatus of nMYC oncogene: 0, amplified, 1, unamplified;binary0, 1
Methylation PCDHB cluster (%)methylation of 17 genes of the Protocadherin B clusterpercentage29.44, …, 88.93
Methylation SFN (%)methylation of the SFN genepercentage36.6, …, 99
OSoverall survivalfloat0.27, …, 164.47
outcomeclinical outcomecategoricalAICR, AWD,
AWSD, CR, DOD
PFSprogression free survivalfloat3.7, …, 74.7, NaN
Table 3

Meaning and values of the features of the dataCK2018 dataset. Number of patients: 20. Number of features: 16 BM: bone marrow. CR: complete response. CT: chemotherapy. DT: differentiation therapy. F: female. gy: gray units. ID: identifier. LDH: lactic acid dehydrogenase. LN: lymph node. LSE: neuron-specific enolase. M: male. MR: mixed response. PR: partial response. S: surgery. U/L: units per liter. VGPR: very good partial response. VMA: vanillylmandelic acid. mg: milligrams. mo: months ng/mL: nanograms per milliliter. no: number. In the original article dataset, sex and age are joined in a unique feature called ‘Sex/age (mo)’, and outcome and outcome months are joined in a unique feature called ‘Outcome (mo)’. Additional information can be found in the dataset original article by Kim et al. (2018).

DATACK2018
FEATUREMEANINGTYPEVALUES
11q–presence of chromosomal aberration at 11q siteBooleanyes, no, –
17q+presence of chromosomal aberration at 17q siteBooleanyes, no, –
1p–presence of chromosomal aberration at 1p sitebooleanyes, no, –
ageage at diagnosismonths0, …, 10.2
ferritinferritin levelsng/mL15, …, 1638.6
LDHlactic acid dehydrogenase levelU/L539, …, 6200
local RTx dose (gy)dosage of radiationfloat15, 23.4, 25.2, –
BM, bone, kidney, liver, LN,
metastatic sitessite of metastsoizationcategoricallung, mediastinum, muscle,
pleura, skin
NSEneuron-specific enolase levelsng/mL7.3, …, 947
outcomeevent-free survival (EFS) or No Evidence of DiseaseBooleanEFS, NED
outcome_mofollow-upmonths17, …, 91
primary sitetumor primary sitebinaryabdomen, mediastinum
sexmale or femalebinaryM, F
tumor response after CT & Sresponse after chemotherapy and surgerycategoricalCR, MR, PR, VGPR
tumor response after DTresponse after differentiation theraphy (DT)categoricalCR, MR, PR, VGPR
urine VMAvanillylmandelic acid levels in urinemg/day0.5, …, 53.9
Table 4

Meaning and values of the features of the dataEV2013 dataset. Number of patients: 19. Number of features: 11. ADF: alive disease-free. AWD: alive with disease. B: bone. BM: bone marrow. CR: complete response. DOD: died of disease. DOS: died of sepsis. DP: disease progression. DTC: died of treatment complication. F: female. HR-NBL1: High-Risk Neuroblastoma Study 1. INES: Infants Neuroblastoma European Study, SIOPEN protocols. LN: lymph nodes. M: male. N-II-92 and NAR-99: names of national clinical trials in Spain (Noguera 2021). PR: partial response. ST: soft tissue. SurPR: surgical partial resection. VGPR: very good partial response. nGNB: nodular ganglioneuroblastoma. pdNB: poorly differentiated NB. uNB: undifferentiated neuroblastoma. VGPR: very good partial response. Additional information can be found in the dataset original article by Villamón et al. (2013).

DATAEV2013
FEATUREMEANINGTYPEVALUES
age at diagnosisagemonths9, …, 108
follow-up timeoverall survivalmonths1, …, 132
metastasespresence of metastasisbooleanyes, no
outcomeclinical outcomecategoricalADF, AWD, DOD, DOS, DTC
pathologypathological categorycategoricalnGNB, pdNB, uNB
protocol treatmenttreatment protocolcategoricalHR-NBL1, INES, LNESG1, N-II-92, NAR-99
relapseif the cancer relapsed or notbooleanyes, no
sexmale or femalebinaryM, F
stagestage of the tumorcategorical1, 2, 3, 4
time to first relapsetime to first relapsemonths4, …, 28
treatment responseresponse to first line treatmentcategoricalCR, DP, PR, SurPR, VGPR
Table 5

Meaning and values of the features of the dataYBC2019 dataset. Number of patients: 7. Number of features: 10. A: amplified. BM: bone marrow. CEC: carboplatin, etoposide, and cyclophosphamide CR: complete response. CT×5: five cycles of chemotherapy. CT×6: six cycles of chemotherapy. CT×7: seven cycles of chemotherapy. Dx: diagnosis. HDCT: high-dose chemotherapy. L-RT: local radiotheraphy. LNs: lymph nodes. MEC: melphalan, carboplatin, and etoposide. MIBG-TM: high-dose 131I-metaiodobenzylguanidine treatment, thiotepa, and melphalan NA: not amplified. PR: partial response. SCT: stem cell transplantation. TTC: topotecan, thiotepa, and carboplatin. VGPR: very good partial response. m: months. y: years. Additional information can be found in the dataset original article by Choi et al. (2019).

DATAYBC2019
FEATUREMEANINGTYPEVALUES
age at Dx.age at diagnosisyears1.5, …, 3.5
age at relapseage at relapseyears4.1, …, 8.6
HDCT1 regimenfirst high-dose chemotherapybinaryTTC, CEC
HDCT2 regimensecond high-dose chemotherapybinaryMEC, MIBG-TM
interval to relapseinterval to relapsemonths12, …, 75
MYCN statusamplified (A) or not amplified (NA)binaryA, NA
relapsed sitesrelapse sites in the bodycategoricalPrimary, Brain, Bone, LNs, BM
stage at Dxonly metastistic tumorscategorical4
treatment prior to haplo-SCTtreatment prior to haploidentical SCTcategoricalSurgery, L-RT, CT×5, CT×6, CT×7
tumor status at haplo-SCTtumor status at haploidentical SCTcategoricalPR, CR, VGPR
Table 6

Meaning and values of the features of the dataYM2018 dataset. Number of patients: 169. Number of features: 13. FH: favorable histology. MYCN: MYCN oncogene. UH: unfavorable histology. Additional information can be found in the dataset original article by Ma et al. (2018).

DATAYM2018
FEATUREMEANINGTYPEVALUES
age0: < 12 months; 1: 12–60 months; 2: ≥ 60 months.integer0, 1, 2
autologous stemautologous stem cell transplantation: 0: no; 1: yes.binary0, 1
cell transplantation
degree of differentiation0: undifferentiated; 1: poorly differentiated; 2: differentiated.categorical0, 1, 2
histology prognosis1: FH favorable histology, 0: UF unfavorable histologybinary0, 1
MYCN statusstatus of nMYC oncogene: 0: amplified; 1: unamplified.binary0, 1
outcomeclinical outcome: 1, dead of disease, 0, alive or lost follow-up.binary0, 1
radiationif the patient had radiationboolean0, 1
riskrisk group: 0: intermediate-risk; 1: high-risk.categorical0, 1
sex0: male; 1: female.integer0, 1
siteprimary tumor site: 0: adrenal gland; 1: mediastinum; 2: others.categorical0, 1, 2
stagestage of the tumorcategorical1, 2, 3, 4
surgical methodstotal or partial resectionbinary0, 1
timeoverall survivalmonths1, …, 100
Language: English
Submitted on: May 3, 2022
Accepted on: Aug 19, 2022
Published on: Oct 4, 2022
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2022 Davide Chicco, Gabriel Cerono, Davide Cangelosi, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.