Detection and localization of hyperfunctioning parathyroid glands on [18F]fluorocholine PET/ CT using deep learning – model performance and comparison to human experts

Jarabek, Leon; Jamsek, Jan; Cuderman, Anka; Rep, Sebastijan; Hocevar, Marko; Kocjan, Tomaz; Jensterle, Mojca; Spiclin, Ziga; Macek Lezaic, Ziga; Cvetko, Filip; Lezaic, Luka

Detection and localization of hyperfunctioning parathyroid glands on [18F]fluorocholine PET/ CT using deep learning – model performance and comparison to human experts

Volume 56 (2022): Issue 4 (December 2022)

By:

Leon Jarabek, Jan Jamsek, Anka Cuderman, Sebastijan Rep, Marko Hocevar, Tomaz Kocjan, Mojca Jensterle, Ziga Spiclin, Ziga Macek Lezaic, Filip Cvetko and Luka Lezaic

Open Access

|Dec 2022

Figures & Tables

mPETResnet10 architecture. First, PET-CT images are fed into UNet with a single channel output and tanh+1 activation function. This output is the PET mask. This mask is elementwise multiplied with PET image to produce a masked PET image. Masked PET is concatenated with the original CT and the masked PET-CT is fed into the ResNet10 classifier. Gray boxes represent deep-learning models, coloured boxes represent data, and circles represent operations of tanh+1, multiplication (mul) by element and concatenation (concat).

Example of novel masked-PET Resnet10 model (mRN10) masking of PET signal in a subject with parathyroid adenoma in the region of lower right parathyroid gland (black arrow in row c). Each row represents a different slice through the preprocessed [18F]fluorocholine PET/CT (FCH-PET) images ((A) – mandibular region, (B) – upper neck region (C) – lower neck region containing parathyroid adenoma). The first column shows a pre-processed PET/CT image (64 × 64 × 32 matrix), where colours toward the “warm” (red) part of the spectrum indicate higher PET signal and colours toward the “cool” (blue) part of the spectrum indicate lower PET signal. The second column shows the mask, where regions coloured toward the red part of the spectrum have higher weights (non-masked) and regions toward the yellow part of the spectrum have lower weights (masked). The third column represents the final masked PET/CT images computed by multiplying the mask with the original PET/ CT. The image was correctly classified as containing the adenoma in the lower right region.

Some examples of masking of hyperactive parathyroid tissue (HPTT), which is indicated by an arrow in column (I). The images are shown in the same format as in Figure 2. Rows (D), (F) and (G) represent the only 3 cases where HPTT was not completely masked.

Confusion matrices for CPr (A) and CLoc (B) for both RN10 and mRN10 models_ Note that the confusion matrices for CLoc have more samples (360 in total), as they were computed by summing the confusion matrices for each of the three included locations (UL, LL, LR)

	CPr task with RN10				CPr task with mRN10
	HPTT present	HPTT present not	sum		HPTT present	HPTT present not	sum
Model HPTT present output	79	8	87	Model HPTT present output	90	11	101
Model output HPTT not present	20	13	33	Model output HPTT not present	9	10	19
sum	99	21	120	sum	99	21	120

Diagnostic performance metrics of RN10 and mRN10 as well as p-values as determined by McNemar test comparing both models for each task (except AUCROC)

	CPr RN10	CPr mRN10	CPr p-value	CLoc RN10	CLoc mRN10	CLoc p-value
Sensitivity [95% CI]	0.800 [0.719; 0.877]	0.909 [0.852; 0.965]	0.028	0.365 [0.268; 0.460]	0.552 [0.453; 0.652]	0.018
Specificity [95% CI]	0.619 [0.411; 0.827]	0.476 [0.263; 0.690]	0.257	0.807 [0.759; 0.854]	0.811 [0.763; 0.858]	0.910
Positive predictive value [95% CI]	0.908 [0.847; 0.969]	0.891 [0.830; 0.951]	0.507	0.407 [0.303; 0.511]	0.515 [0.418; 0.611]	0.089
Negative predictive value [95% CI]	0.394 [0.227; 0.560]	0.526 [0.302; 0.751]	0.205	0.777 [0.728; 0.827]	0.833 [0.787; 0.878]	0.021
Accuracy [95% CI]	0.767 [0.681; 0.839]	0.833 [0.756; 0.895]	0.050	0.689 [0.638; 0.736]	0.742 [0.693 0.786]	0.031
AUCROC	0.815	0.849	/	0.702	0.770	/

Comparison of mRN10 and human performance for the CLoc task_ p-values were determined by using the McNemar test

	CLoc mRN10	CLoc human	p-value
Sensitivity [95% CI]	0.552 [0.453; 0.652]	0.917 [0.857; 0.958]	< 0.001
Specificity [95% CI]	0.811 [0.763; 0.858]	0.997 [0.986; 0.999]	< 0.001
Positive predictive value [95% CI]	0.515 [0.418; 0.611]	0.992 [0.945; 0.999]	< 0.001
Negative predictive value [95% CI]	0.833 [0.787; 0.878]	0.972 [0.952; 0.984]	< 0.001
Accuracy [95% CI]	0.742 [0.693; 0.786]	0.977 [0.960; 0.988]	< 0.001

Performance of several models on CPr task

Model name	mRN10	RN10	Resnet50	Resnet101	Densenet101	PreActResnet101	WideResnet101
parameters # Trainable (millions)	33.5	14.3	46.2	85.2	112.9	85.2	85.2
Optimal learning initial rate	0.0136	0.0136	2.15*10-3	1.47*10-4	0.316	1.47*10-4	2.15*10-3
Mean CPr AUCROC [95% CI]	0.850 [0.734; 0.998]	0.812 [0.716; 0.994]	0.754 [0.624; 0.980]	0.527 [0.410; 0.639]	0.703 [0.606; 0.905]	0.739 [0.486; 0.998]	0.752 [0.653; 0.966]

DOI: https://doi.org/10.2478/raon-2022-0037 | Journal eISSN: 1581-3207 | Journal ISSN: 1318-2099

Journal RSS Feed

Language: English

Page range: 440 - 452

Submitted on: Apr 21, 2022

Accepted on: Aug 22, 2022

Published on: Dec 13, 2022

Published by: Association of Radiology and Oncology

In partnership with: Paradigm Publishing Services

Publication frequency: 4 issues per year

Keywords:

primary hyperparathyroidism,

Related subjects:

Haematology, oncology,

Radiology

© 2022 Leon Jarabek, Jan Jamsek, Anka Cuderman, Sebastijan Rep, Marko Hocevar, Tomaz Kocjan, Mojca Jensterle, Ziga Spiclin, Ziga Macek Lezaic, Filip Cvetko, Luka Lezaic, published by Association of Radiology and Oncology
This work is licensed under the Creative Commons Attribution 4.0 License.

Previous article Volume 56 (2022): Issue 4 (December 2022)Next article

Detection and localization of hyperfunctioning parathyroid glands on [18F]fluorocholine PET/ CT using deep learning – model performance and comparison to human experts

Figures & Tables

Figure 1

Figure 2

Figure 3

Confusion matrices for CPr (A) and CLoc (B) for both RN10 and mRN10 models_ Note that the confusion matrices for CLoc have more samples (360 in total), as they were computed by summing the confusion matrices for each of the three included locations (UL, LL, LR)

Diagnostic performance metrics of RN10 and mRN10 as well as p-values as determined by McNemar test comparing both models for each task (except AUCROC)

Comparison of mRN10 and human performance for the CLoc task_ p-values were determined by using the McNemar test

Performance of several models on CPr task

Paradigm

My account