Endoscopic biopsy is the standard diagnostic for gastrointestinal graft-versus-host disease (GI GVHD), but its application in post-transplant patients is constrained by severe thrombocytopenia, neutropenia, and concurrent infections. In a prospective single-centre cohort of 51 episodes with histopathologically confirmed GI GVHD, a four-parameter contrast-enhanced ultrasound (CEUS) score correlated with the Lerner histopathological grade (Spearman’s rho = 0.684; 95% CI: 0.503–0.807; p < 0.001), with progressive separation of median scores from grade I to grade IV. Because the analytic cohort was restricted to confirmed GI GVHD episodes, the score is positioned as a potential severity-assessment adjunct; external validation in larger multicentre cohorts is required.
Gastrointestinal graft-versus-host disease (GI GVHD) is a major complication of allogeneic haematopoietic stem-cell transplantation (allo-HSCT) and remains a leading driver of non-relapse mortality in patients with severe forms [1]. Symptoms (diarrhoea, abdominal cramping, nausea, vomiting, and gastrointestinal bleeding) overlap with cytomegalovirus colitis, Clostridioides difficile infection, drug-induced enteropathy, and transplant-associated thrombotic microangiopathy, complicating clinical recognition [1,2]. Diagnostic confirmation is based on histopathological evaluation of endoscopic biopsies according to the German-Austrian-Swiss consensus criteria, with severity graded according to the Lerner system (grade I–IV) [3]. Higher Lerner grades have been associated with steroid-refractoriness, non-relapse mortality, and reduced overall survival [4,5,6].
The patient population in whom this assessment is most needed is one in which invasive procedures carry substantial risk. Severe thrombocytopenia is highly prevalent in the early post-transplant period and is itself a marker of poor outcome [7,8]. Institutional thresholds for full colonoscopy commonly require platelet counts ≥80×109/L, and a meta-analysis of endoscopy in thrombocytopenic patients reported a higher risk of post-procedural bleeding at platelet counts below 50×109/L [9]. In addition to thrombocytopenia, allo-HSCT recipients frequently present with neutropenia, ongoing immunosuppression, and concomitant gastrointestinal infections, all of which further constrain endoscopic assessment. Where biopsies are obtained, histopathological yield can be limited by sampling bias, the focal distribution of disease along the gastrointestinal tract, and the diagnostic delay associated with histological processing [10,11]. Diagnostic and severity-grading strategies that are non-invasive, available at the bedside, and tolerated by fragile patients are therefore clinically relevant.
Contrast-enhanced ultrasound (CEUS), using sulphur hexafluoride microbubbles as a strictly intravascular contrast agent, has been investigated as a non-invasive imaging tool for evaluating intestinal disease in this setting [12,13]. CEUS visualises mural microcirculation in real time, can be performed at the bedside, does not require ionising radiation, and is not nephrotoxic. Weber and colleagues developed a five-parameter sonographic score combining B-mode, colour Doppler, compound elastography, and CEUS, with a reported area under the curve of 1.0 for the diagnosis of GI GVHD and 0.88 for severe disease [14]. Pausch and colleagues subsequently introduced parametric colour-coded CEUS imaging to reduce operator dependency [15]. A previous narrative review by our group highlighted the potential of CEUS for early GVHD diagnosis, while noting persistent gaps in its correlation with histopathological severity in non-elastography-based scores [12].
In the present study, we evaluated the association between a composite four-parameter CEUS-GVHD score — derived from B-mode and CEUS findings without compound elastography — and the Lerner histopathological grade in patients with histopathologically confirmed GI GVHD. The aim was exploratory: to characterise the strength of correlation between an imaging-based composite score and the gold-standard histopathological severity assessment, in a cohort restricted to confirmed GVHD cases. The study was not designed to evaluate diagnostic accuracy or to address the differentiation of GI GVHD from other post-transplant gastrointestinal complications. We also describe the haematological and infectious characteristics of the cohort, which inform the clinical context in which a non-invasive imaging adjunct would be applied.
This was a prospective, observational, single-centre study performed in the Bone Marrow Transplant Unit of Fundeni Clinical Institute, Bucharest, Romania, between 1 October 2018 and 31 December 2025. The study was conducted in accordance with the Declaration of Helsinki (2013 revision) and was approved by the Ethics Council of Fundeni Clinical Institute (approval no. 40902 of 27 September 2018). All participants provided written informed consent for study participation and for administration of the ultrasound contrast agent. The study is reported in accordance with the STROBE statement for observational cohort studies.
Eligible patients were ≥18 years old, had undergone allo-HSCT within the previous two years, and developed new-onset gastrointestinal symptoms (diarrhoea, abdominal cramping, vomiting, or gastrointestinal bleeding) prompting clinical suspicion of GI GVHD. Exclusion criteria were age <18 years, refusal of informed consent, more than two years since transplantation, contraindications to ultrasound contrast administration, and severe haemodynamic instability precluding examination. Episodes with concurrent histopathologically confirmed CMV colitis superimposed on GI GVHD (i.e., presence of both GVHD-typical apoptotic features and CMV-specific cytopathic inclusions on the same biopsy) were excluded prospectively from the analytic cohort, given the inability to assign the relative contribution of GVHD and CMV-related changes to bowel-wall imaging abnormalities in such cases.
Each clinical episode raising a new suspicion of GI GVHD was treated as an independent investigational event. This approach reflects the clinical reality that post-transplant patients frequently experience repeated symptomatic episodes, sometimes with distinct aetiologies and at different time points after transplantation, each requiring a separate diagnostic work-up. The same approach has been used in prior CEUS-GVHD studies in which separate episodes with separate histological confirmation were evaluated independently in the same patient [14].
From the full prospective cohort of 89 investigational episodes in 61 patients, the present analysis included all episodes with histopathologically confirmed GI GVHD (acute or chronic) and a documented Lerner grade. This yielded 51 episodes in 41 unique patients. Episodes without histopathological confirmation, or in which biopsy material was insufficient for grading, were excluded from this analysis. The participant flow is shown in Figure 1. Of the 22 unique patients excluded due to a non-GVHD diagnosis at clinical-pathological workup, 9 also contributed at least one analytic episode and are counted within the analytic cohort; similarly, 2 of the 10 unique patients excluded for clinically suspected GI GVHD without histopathological confirmation also contributed at least one analytic episode.

— Participant flow diagram.
Endoscopic procedures (oesophagogastroduodenoscopy and/or ileocolonoscopy) were performed by gastroenterologists according to standard institutional protocols. The CEUS examination was performed before endoscopy in all cases, and the segments showing the most prominent CEUS abnormalities were communicated to the endoscopist as targets for directed biopsy sampling, in addition to any macroscopically suspicious areas identified during endoscopy. Biopsies were also obtained from macroscopically normal-appearing mucosa whenever feasible. As a consequence of this CEUS-guided sampling strategy, in all 51 episodes of the analytic cohort the histopathological diagnosis and Lerner grade were derived from biopsies obtained from at least one segment that had been formally evaluated by CEUS.
Histopathological diagnosis of GI GVHD followed the German-Austrian-Swiss consensus criteria, requiring at least 6 apoptotic bodies per 10 consecutive crypts and excluding pseudoapoptosis and other entities histologically mimicking GVHD [3]. Histopathological severity was graded using the Lerner system: grade I (single apoptosis without crypt loss), grade II (focal crypt loss), grade III (multiple contiguous crypt loss without ulceration), and grade IV (denudation of the mucosa with extensive ulceration) [3]. Histopathological evaluation was performed by the institutional team of pathologists with specific experience in post-transplant pathology; each biopsy specimen was reviewed by a single pathologist within the team according to the institutional rotation, without formal adjudication, and inter-observer agreement was not assessed for the present study. The pathologist was not informed of the CEUS findings; all samples were submitted with the standardised clinical indication “suspicion of digestive GVHD.” When biopsies from multiple anatomical sites in the same investigational episode yielded different histopathological findings, the highest Lerner grade was assigned as the episode-level grade, in line with established practice in clinical-pathological correlation studies of patchy intestinal disease and with the prognostic relevance of the most severely affected sample [4,5].
All examinations were performed by a single ultrasonographer trained in abdominal and contrast-enhanced ultrasound, blinded to the histopathological results, using a Fujifilm Arietta 750 ultrasound system (Fujifilm Healthcare, Tokyo, Japan) with a multi-frequency convex probe (1–6 MHz) and high-resolution linear probes (3–8 MHz and 5–18 MHz) for detailed bowel-wall assessment. Patients were examined supine after a minimum 6-hour fast. After grey-scale and colour-Doppler evaluation of the gastrointestinal tract, contrast-enhanced ultrasound was performed with the 1–6 MHz convex probe using 2 mL of sulphur hexafluoride microbubble suspension (SonoVue, Bracco, Italy) administered as a rapid intravenous bolus through a peripheral cannula, followed by a 10-mL flush of 0.9% saline. Examination was performed in contrast-harmonic imaging mode at a low mechanical index, recording the arterial, venous, and late vascular phases for 1–5 minutes after injection.
The composite CEUS-GVHD score (range 0–12) integrates four standardised parameters derived from B-mode and CEUS — bowel-wall thickness at the symptomatic target segment, bowel-loop dilatation, contrast wash-out from the bowel wall, and transmural microbubble migration into the bowel lumen — each scored from 0 (normal) to 3 (most severe). The four parameters and their cut-offs are summarised in Table 1. The total score is the sum of the four parameter scores. Representative CEUS images illustrating the four parameters are provided in Figures 2, and 3.

— B-mode US examination of the ileum showing bowel-wall thickening (a, longitudinal view; b, transverse view).

— Small bowel CEUS examination: (a) arterial phase, intense wall enhancement and microbubble migration into the lumen; (b) late phase, microbubble migration into the lumen at multiple segments.
Composite CEUS-GVHD score: parameter definitions and scoring criteria.
| Parameter | Score 0 (normal) | Score 1 (mild) | Score 2 (moderate) | Score 3 (severe) |
|---|---|---|---|---|
| Bowel-wall thickness* | normal | N + 1–2 mm | N + 3–4 mm | N + ≥5 mm |
| Bowel-loop dilatation (luminal diameter)† | SB <3 cm; colon <6 cm | SB 3–4 cm; colon 6–9 cm | SB 5–6 cm; colon 10–13 cm | SB >6 cm; colon >13 cm |
| Contrast wash-out from bowel wall | Normal (∼30 s) | Mildly delayed (∼1 min) | Moderately delayed (∼2 min) | Severely delayed (>2 min) |
| Transmural microbubble migration | Absent | Confined to bowel wall | Luminal migration in one segment | Luminal migration in more than one segment |
Measured at the symptomatic target segment: stomach or duodenum in patients with predominantly upper gastrointestinal symptoms; small bowel or colon in patients with predominantly lower gastrointestinal symptoms.
Thresholds refer to the maximal luminal diameter of the bowel measured by ultrasound. SB = small bowel; N = segment-specific normal value. Total CEUS-GVHD score = sum of the four parameter scores (range 0–12). CEUS = contrast-enhanced ultrasound; GI GVHD = gastrointestinal graft-versus-host disease.
The four parameters of the CEUS-GVHD score, their ordinal scales, and the cut-off thresholds were prespecified before data collection began in October 2018, on the basis of prior literature and the imaging team’s clinical experience; they were not modified during the study, and no iterative adjustment was performed. Bowel-wall thickness cut-offs were derived from the IBD ultrasound literature (segment-specific normal-range thresholds) and the multimodal sonographic score by Weber et al. [14]. Bowel-loop dilatation was selected on the basis of IBD ultrasound practice, with score 0 set at the upper limit of physiological luminal diameter and higher categories representing progressive dilatation. Transmural microbubble migration was selected on the basis of the original description of microcirculatory bowel-wall changes in I-aGVHD by Benedetti et al. [12] and its inclusion in the score by Weber et al. [14]. Although delayed wash-out is an established CEUS marker of perfusion abnormality in intestinal disease, the specific four-level ordinal scale (∼30 s; ∼1 min; ∼2 min; >2 min) was set by consensus within the imaging team for consistency of the composite score (each parameter contributing 0–3 points) rather than adopted from a published CEUS-GVHD instrument. No cut-off was derived de novo from the present cohort’s outcome data.
Individual parameter characteristics were recorded in real time during the ultrasound examination, in a structured ultrasound report. The total CEUS-GVHD score was computed retrospectively as the sum of the four prespecified parameter scores. Video recordings (cine-loops) were available for a subset, but not all, examinations during the study period; this precluded formal blinded re-scoring or quantitative intra-observer reproducibility analysis.
The ultrasonographer was blinded to the histopathological and macroscopic endoscopic findings, which were obtained after CEUS according to the institutional workflow. The ultrasonographer was, however, aware of the clinical context that prompted the investigation, including the transplant team’s clinical GVHD assessment, contemporaneous laboratory values (platelet count, neutrophil count, C-reactive protein), and any GVHD-directed treatment initiated before the examination. This degree of clinical context reflects routine CEUS practice in post-transplant patients and the real-world setting in which the score would be applied. The CEUS-GVHD score itself was computed strictly from the prespecified imaging parameters and was not adjusted on the basis of clinical or laboratory information.
Continuous variables are presented as median (interquartile range, IQR) and range; categorical variables as counts and proportions. Associations between each CEUS parameter (or the total CEUS-GVHD score) and the Lerner grade were assessed using Spearman’s rank correlation coefficient (rho), with 95% confidence intervals computed by Fisher’s z-transformation. Trend tests for ordered associations between individual CEUS parameter scores and Lerner grades were performed using the Jonckheere–Terpstra test. Differences in the total CEUS-GVHD score across Lerner grades were tested with the Kruskal–Wallis test, with post-hoc pairwise comparisons by the Dunn test with Bonferroni correction. Group comparisons in sensitivity analyses used the Mann–Whitney U test. For descriptive purposes, episodes were also grouped into mild (Lerner I–II) and severe (Lerner III–IV) histopathological disease, in line with prior pathological classifications [4,5].
Because five Spearman correlations were tested in parallel against the Lerner grade, Bonferroni correction was applied to control the family-wise error rate (adjusted α = 0.01 per test). The corrected p-values are intended as a sensitivity check rather than as a primary inference criterion.
As an exploratory secondary analysis, the discriminative performance of the total CEUS-GVHD score for distinguishing severe (Lerner III–IV) from mild (Lerner I–II) GI GVHD was evaluated by ROC analysis. The area under the ROC curve was reported with binomial exact 95% confidence intervals. At the Youden-optimal cut-off, sensitivity, specificity, predictive values, likelihood ratios, and accuracy were calculated with Clopper–Pearson 95% confidence intervals.
Several prespecified sensitivity analyses were performed: (i) within-patient analysis restricted to the first chronologically documented episode per patient (n = 41); (ii) stratified analysis by infection status (active GI infection vs no documented infection); (iii) stratified analysis by GVHD form (acute vs chronic); and (iv) an exploratory simplified two-parameter score restricted to the two CEUS components with the highest individual correlation with the Lerner grade (transmural microbubble migration and bowel-wall thickness; range 0–6). For analyses (ii) and (iii), within-stratum Spearman correlations and Mann–Whitney U comparisons of CEUS scores between strata were performed.
No prespecified sample size calculation was performed; the cohort size was determined by the availability of histopathologically confirmed episodes during enrolment. Post-hoc precision was characterised by the 95% CI around the primary rho (0.503–0.807; width 0.304), allowing characterisation of moderate-to-strong correlations but not smaller effect sizes; implications are discussed in the Limitations section.
A two-sided p-value <0.05 was considered statistically significant. Statistical analyses were performed using IBM SPSS Statistics version 29 (IBM Corp., Armonk, NY, USA) and MedCalc Statistical Software version 23.5.2 (MedCalc Software Ltd., Ostend, Belgium).
The analytic cohort comprised 51 episodes of histopathologically confirmed GI GVHD in 41 unique patients (Figure 1). Demographic and transplant characteristics are reported per patient (n = 41); episode-level characteristics per episode (n = 51). The median patient age was 42 years (range 19–64), with 25 men (61%) and 16 women (39%). The underlying haematological diagnoses were acute leukaemia in 31 patients (76%), lymphoma in 4 (10%), adult T-cell leukaemia/lymphoma in 2 (5%), chronic myeloid leukaemia in 2 (5%), chronic myelomonocytic leukaemia in 1 (2%), and myelodysplastic syndrome in 1 (2%). Donor types were matched unrelated (15; 37%), matched related (10; 24%), haploidentical (9; 22%), and mismatched unrelated (7; 17%). Conditioning was reduced-intensity in 21 (51%), reduced-toxicity in 14 (34%), and myeloablative in 6 (15%). GVHD prophylaxis was PTCy + MMF + CNI in 28 patients (68%), MTX + CNI in 9 (22%), ATG + MTX + CNI in 3 (7%), and other in 1 (2%). Acute GI GVHD accounted for 37 episodes (Lerner I/II/III/IV: 11/12/7/7) and chronic GI GVHD for 14 episodes (Lerner I/II/III/IV: 5/2/5/2). Endoscopic access was upper gastrointestinal in 16 episodes and lower gastrointestinal in 35. The median time from transplantation to CEUS was 89 days (IQR 48–176; range 20–451).
The cohort presented with the haematological and infectious profile typical of the post-transplant population. The median platelet count at the time of CEUS was 88×109/L (IQR 59–183; range 12–423); 22 episodes (43%) had platelet counts below the 80×109/L institutional threshold for full colonoscopy, and 2 episodes (4%) below 20×109/L. The median absolute neutrophil count was 2.65×109/L (range 0.33–15.74); 7 episodes (14%) had neutropenia (ANC <1.0×109/L). The median CRP at the time of CEUS was 11.5 mg/L (IQR 5.0–60.2; range 0.3–152.0). Concurrent active gastrointestinal infection was documented in 17 episodes (33%): Clostridioides difficile (n = 10), enteropathogenic or enterotoxigenic Escherichia coli (n = 6), and SARS-CoV-2 enteric infection (n = 1). Detectable serum CMV PCR was present in 6 episodes (12%) at a low median viral load of 1,500 IU/mL (IQR 1,192–1,678; range 177–2,930), consistent with subclinical CMV reactivation rather than active CMV enteric disease. None of these 6 episodes showed CMV-specific cytopathic inclusions on histopathological examination of the gastrointestinal biopsies. CMV PCR positivity in serum did not overlap with documented bacterial gastrointestinal infection in any episode.
All four individual CEUS parameters showed graded distributions across Lerner grade categories with statistically significant ordered trends under the Jonckheere–Terpstra test (Table 2). Transmural microbubble migration showed the strongest ordered trend (J = 730.5; z = 4.213; p < 0.001), followed by bowel-wall thickness (J = 709.0; z = 3.850; p < 0.001), contrast wash-out (J = 666.0; z = 3.124; p = 0.002), and bowel-loop dilatation (J = 620.0; z = 2.347; p = 0.019). The strength of the ordered trend was greatest for transmural microbubble migration, with the proportion of episodes scoring 3 increasing progressively from 0/16 in Lerner I to 6/9 in Lerner IV; for bowel-wall thickness, the proportion scoring 3 increased from 1/16 in Lerner I to 7/9 in Lerner IV; for contrast wash-out, the proportion scoring 3 increased from 4/16 in Lerner I to 8/9 in Lerner IV.
Distribution of individual CEUS parameter scores across Lerner histopathological grades, with Spearman’s rank correlations.
| CEUS parameter | Score | Lerner I (n=16) | Lerner II (n=14) | Lerner III (n=12) | Lerner IV (n=9) | Jonckheere–Terpstra p | Spearman’s rho (95% CI; p; p Bonf) |
|---|---|---|---|---|---|---|---|
| Bowel-wall thickness | 0 | 1 | 0 | 0 | 0 | <0.001 | 0.597 (0.385–0.749; <0.001; <0.001) |
| 1 | 3 | 3 | 0 | 0 | |||
| 2 | 11 | 8 | 4 | 2 | |||
| 3 | 1 | 3 | 8 | 7 | |||
| Bowel-loop dilatation | 0 | 7 | 2 | 2 | 2 | 0.019 | 0.338 (0.069–0.561; 0.015; 0.075) |
| 1 | 7 | 6 | 3 | 1 | |||
| 2 | 0 | 2 | 3 | 3 | |||
| 3 | 2 | 4 | 4 | 3 | |||
| Contrast wash-out | 0 | 2 | 0 | 0 | 0 | 0.002 | 0.469 (0.222–0.659; <0.001; 0.005) |
| 1 | 4 | 5 | 1 | 0 | |||
| 2 | 6 | 4 | 4 | 1 | |||
| 3 | 4 | 5 | 7 | 8 | |||
| Transmural microbubble migration | 0 | 0 | 0 | 0 | 0 | <0.001 | 0.616 (0.410–0.762; <0.001; <0.001) |
| 1 | 13 | 5 | 3 | 1 | |||
| 2 | 3 | 8 | 6 | 2 | |||
| 3 | 0 | 1 | 3 | 6 |
Values represent the number of episodes in each Lerner grade category with each parameter score. Chi-squared p-values reflect the global association across grades. Spearman’s rho values quantify the strength of the rank correlation between each individual CEUS parameter and the Lerner grade. CEUS, contrast-enhanced ultrasound.
Spearman’s rank correlation analysis showed graded associations between individual CEUS parameters and the Lerner grade (Table 2). Transmural microbubble migration showed the highest correlation (rho = 0.616; 95% CI: 0.410–0.762; unadjusted p < 0.001; Bonferroni-adjusted p < 0.001), followed by bowel-wall thickness (rho = 0.597; 95% CI: 0.385–0.749; unadjusted p < 0.001; adjusted p < 0.001), contrast wash-out (rho = 0.469; 95% CI: 0.222–0.659; unadjusted p < 0.001; adjusted p = 0.005), and bowel-loop dilatation (rho = 0.338; 95% CI: 0.069–0.561; unadjusted p = 0.015; adjusted p = 0.075). After Bonferroni correction for the five parallel correlations against the Lerner grade (adjusted α = 0.01 per test), four of the five correlations remained statistically significant; the bowel-loop dilatation correlation did not retain Bonferroni-adjusted significance and is therefore considered hypothesis-generating.
The total CEUS-GVHD score showed a positive correlation with the Lerner histopathological grade (Spearman’s rho = 0.684; 95% CI: 0.503–0.807; unadjusted p < 0.001; Bonferroni-adjusted p < 0.001). The median CEUS score across grades was 5.5 (IQR 4.5–6; range 3–8) in Lerner I, 7 (IQR 5–9; range 4–12) in Lerner II, 8.5 (IQR 7–11; range 5–12) in Lerner III, and 10 (IQR 9–11; range 9–12) in Lerner IV (Kruskal–Wallis H = 23.369; df = 3; p < 0.001; Jonckheere–Terpstra p < 0.001; Table 3; Figure 4). Post-hoc pairwise comparisons (Dunn test with Bonferroni correction) showed statistically significant differences between Lerner I and Lerner IV (adjusted p < 0.001) and between Lerner I and Lerner III (adjusted p = 0.002); the Lerner II vs Lerner IV comparison was at the threshold of significance (adjusted p = 0.057), while the remaining pairwise comparisons (Lerner I vs II, Lerner II vs III, Lerner III vs IV) did not reach statistical significance.

— Box-plot of the total CEUS-GVHD score across the four Lerner histopathological grades (n=51).
Distribution of the total CEUS-GVHD score across Lerner histopathological grades.
| Statistic | Lerner I (n=16) | Lerner II (n=14) | Lerner III (n=12) | Lerner IV (n=9) |
|---|---|---|---|---|
| Median CEUS-GVHD score | 5.5 | 7 | 8.5 | 10 |
| Interquartile range (IQR) | 4.5–6 | 5–9 | 7–11 | 9–11 |
| Range | 3–8 | 4–12 | 5–12 | 9–12 |
Kruskal–Wallis test for differences across Lerner grades: p<0.001. Spearman’s rank correlation between the total CEUS-GVHD score and the Lerner grade: rho 0.684; p<0.001. CEUS, contrast-enhanced ultrasound; GVHD, graft-versus-host disease.
Within-patient sensitivity analysis.
| Variable / parameter | Primary analysis (n=51 episodes; 41 patients) | Within-patient sensitivity (n=41; first episode per patient) |
|---|---|---|
| Spearman rho CEUS_total ↔ Lerner | 0.684 (0.503–0.807); p<0.001 | 0.696 (0.495–0.827); p<0.001 |
| Bowel-wall thickness ↔ Lerner | 0.597 (0.385–0.749); p<0.001 | 0.601 (0.360–0.767); p<0.001 |
| Bowel-loop dilatation ↔ Lerner | 0.338 (0.069–0.561); p=0.015 | 0.452 (0.168–0.667); p=0.003 |
| Contrast wash-out ↔ Lerner | 0.469 (0.222–0.659); p<0.001 | 0.435 (0.147–0.655); p=0.004 |
| Microbubble migration ↔ Lerner | 0.616 (0.410–0.762); p<0.001 | 0.659 (0.441–0.804); p<0.001 |
| Bonferroni-adjusted p CEUS_total | <0.001 | <0.001 |
| Bowel-wall thickness | <0.001 | <0.001 |
| Bowel-loop dilatation | 0.075 | 0.015 |
| Contrast wash-out | 0.005 | 0.022 |
| Microbubble migration | <0.001 | <0.001 |
| Kruskal–Wallis | H=23.369; df=3; p<0.001 | H=19.512; df=3; p<0.001 |
All Spearman correlations are presented as rho (95% CI; unadjusted p). Bonferroni-adjusted p-values reflect correction for five parallel correlations (adjusted α = 0.01). The 95% CIs for Spearman rho were computed by Fisher’s z-transformation. CEUS_total = total CEUS-GVHD score (range 0–12).
When grouped by histopathological severity, episodes with mild disease (Lerner I–II, n = 30) had a median CEUS score of 6 (IQR 5–8), and episodes with severe disease (Lerner III–IV, n = 21) had a median CEUS score of 9 (IQR 7–11) (Figure 5).

— Distribution of the total CEUS-GVHD score in mild (Lerner I–II, n=30) versus severe (Lerner III–IV, n=21) GI GVHD.
Because the 51 episodes arose from 41 unique patients, with 10 patients contributing two episodes each, a within-patient sensitivity analysis was performed by restricting the analysis to the first chronologically documented episode per patient (n = 41). The Spearman correlation between the total CEUS-GVHD score and the Lerner grade in this within-patient cohort was rho = 0.696 (95% CI: 0.495–0.827; p < 0.001; Bonferroni-adjusted p < 0.001) — substantially similar to the primary result. The Kruskal–Wallis test for differences across Lerner grades yielded H = 19.512; df = 3; p < 0.001. Individual parameter correlations were also similar (Table 4). All five correlations retained statistical significance after Bonferroni correction in this analysis.
Concurrent active GI infection was documented in 17 of 51 episodes (33.3%). Median CEUS-GVHD scores were higher in episodes with concurrent GI infection than in those without (median 9 vs 7), but the difference did not reach statistical significance (Mann–Whitney U = 359.500; p = 0.156). The correlation between the CEUS-GVHD score and the Lerner grade was preserved within each stratum: in the uninfected stratum (n = 34), Spearman’s rho = 0.631 (95% CI: 0.372–0.799; p < 0.001); in the infected stratum (n = 17), Spearman’s rho = 0.708 (95% CI: 0.344–0.887; p = 0.001).
The analytic cohort comprised 37 acute and 14 chronic GI GVHD episodes. Median CEUS-GVHD scores were comparable between the two strata (acute: median 8; chronic: median 7; Mann–Whitney U = 234.500; standardised z = −0.521; p = 0.602). The correlation between the CEUS-GVHD score and the Lerner grade was nearly identical in the two strata: acute (n = 37): Spearman’s rho = 0.682 (95% CI: 0.459–0.824; p < 0.001); chronic (n = 14): rho = 0.692 (95% CI: 0.256–0.894; p = 0.006). The wider confidence interval in the chronic GVHD subgroup reflects the smaller sample size.
Given the clinical relevance of distinguishing mild (Lerner I–II) from severe (Lerner III–IV) GI GVHD, we performed an exploratory ROC analysis of the total CEUS-GVHD score. We acknowledge that the small subgroup sizes (severe disease n = 21 in the primary analysis and n = 17 in the within-patient sensitivity analysis) make this an exploratory secondary analysis, not a formal diagnostic validation.
In the primary analysis (n = 51 episodes), the area under the ROC curve was 0.853 (95% CI: 0.726–0.937). The Youden-optimal cut-off was >8 points, at which the score showed a sensitivity of 71.43%, specificity of 86.67%, and overall accuracy of 80.39% (Table 5).
Exploratory diagnostic performance of the total CEUS-GVHD score for discriminating severe (Lerner III–IV) from mild (Lerner I–II) GI GVHD, at the Youden-optimal cut-off (>8 points).
| Diagnostic performance metric | Primary analysis (n=51) | Within-patient sensitivity (n=41) |
|---|---|---|
| AUC (95% CI binomial exact) | 0.853 (0.726–0.937) | 0.853 (0.707–0.944) |
| Youden-optimal cut-off | >8 | >8 |
| Sensitivity, % (95% CI) | 71.43 (47.82–88.72) | 76.47 (50.10–93.19) |
| Specificity, % (95% CI) | 86.67 (69.28–96.25) | 83.33 (62.62–95.26) |
| PPV, % (95% CI) | 78.95 (59.15–90.67) | 76.47 (56.12–89.20) |
| NPV, % (95% CI) | 81.25 (68.47–89.63) | 83.33 (67.57–92.31) |
| LR+, (95% CI) | 5.36 (2.07–13.87) | 4.59 (1.81–11.66) |
| LR−, (95% CI) | 0.33 (0.17–0.66) | 0.28 (0.12–0.68) |
| Accuracy, % (95% CI) | 80.39 (66.88–90.18) | 80.49 (65.13–91.18) |
| Disease prevalence (%) — severe Lerner III–IV | 41.18 (27.58–55.83) | 41.46 (26.32–57.89) |
AUC = area under the ROC curve; PPV = positive predictive value; NPV = negative predictive value; LR+ = positive likelihood ratio; LR− = negative likelihood ratio.
In the within-patient sensitivity analysis (n = 41), the area under the ROC curve was identical (AUC 0.853; 95% CI: 0.707–0.944), with the same Youden-optimal cut-off (>8 points), and similar diagnostic performance: sensitivity 76.47%, specificity 83.33%, accuracy 80.49%. The ROC curves for both analyses are shown in Figure 6 (panel A: primary; panel B: within-patient sensitivity).

— Exploratory ROC curves of the total CEUS-GVHD score for discriminating severe (Lerner III–IV) from mild (Lerner I–II) GI GVHD. Panel A: primary analysis (n=51); panel B: within-patient sensitivity analysis (n=41)
We computed an exploratory two-parameter score (transmural microbubble migration + bowel-wall thickness; range 0–6). On the primary analytic cohort (n = 51), the simplified two-parameter score showed a stronger correlation with the Lerner grade than the full four-parameter score (Spearman’s rho = 0.771; 95% CI: 0.630–0.863 vs rho = 0.684; 95% CI: 0.503–0.807; both p < 0.001), and a higher area under the ROC curve for discriminating severe (Lerner III–IV) from mild (Lerner I–II) GI GVHD (AUC = 0.892; 95% CI: 0.773–0.962, with Youden-optimal cut-off >3 yielding sensitivity 95.24% and specificity 70.00%, vs AUC 0.853 for the full four-parameter score). In the within-patient sensitivity analysis (n = 41), the corresponding correlation and AUC were similar (rho = 0.784; 95% CI: 0.628–0.880; AUC = 0.890; 95% CI: 0.752–0.966). These findings are exploratory and were generated on the same dataset that was used to identify the two highest-correlating components; they should therefore be considered hypothesis-generating rather than confirmatory, and external validation in independent cohorts is required before any clinical use.
In this prospective single-centre cohort of 51 episodes of histopathologically confirmed GI GVHD in 41 patients, the total CEUS-GVHD score derived from a four-parameter B-mode and contrast-enhanced ultrasound assessment was associated with the Lerner histopathological grade (Spearman’s rho = 0.684; 95% CI: 0.503–0.807; p < 0.001). Median scores increased across the four severity categories. These findings suggest that the CEUS-GVHD score may reflect histopathological severity at the group level. However, the overlap between adjacent Lerner grades, the within-patient clustering, and the single-operator, single-centre design preclude conclusions about individual-level severity grading or substitution of histopathological assessment.
Weber and colleagues developed a five-parameter sonographic score combining B-mode, colour Doppler, compound elastography, and CEUS, with reported diagnostic performance for severe disease comparable to the discrimination observed in our cohort [14]. Our score uses a simplified four-parameter approach without compound elastography, which may improve transferability to centres lacking elastography capability, although this assumption requires direct comparative testing. Pausch and colleagues have demonstrated that parametric colour-coded CEUS imaging can reduce operator dependency [15], a methodological direction that complements the standardised parameter-based scoring approach used in our study but was not implemented in our protocol.
Among the four individual parameters, transmural microbubble migration and bowel-wall thickness showed the highest rank correlations with the Lerner grade (rho = 0.616 and 0.597, respectively). The microbubble migration phenomenon has been described as an imaging feature associated with loss of intestinal mucosal barrier integrity in severe GI GVHD [12,14,16]. In an exploratory post-hoc analysis suggested in revision, a simplified two-parameter score limited to these two components performed numerically better than the full four-parameter score on the same data; and we present this as a hypothesis to be tested prospectively. Bowel-loop dilatation showed the weakest, but in the primary analysis still nominally significant, correlation (rho = 0.338; unadjusted p = 0.015), which did not retain significance after Bonferroni correction (adjusted p = 0.075); this parameter may reflect functional ileus, which is more closely linked to clinical symptom severity than to histopathological grade itself, and should be considered hypothesis-generating in this analysis.
Five parallel correlations were tested against the Lerner grade, and we applied Bonferroni correction to control the family-wise error rate. Four of the five correlations — the total CEUS-GVHD score and the correlations for transmural microbubble migration, bowel-wall thickness, and contrast wash-out — retained statistical significance after Bonferroni adjustment, supporting their robustness as exploratory associations. Of note, in the within-patient sensitivity analysis (n = 41), all five correlations retained Bonferroni-adjusted significance, suggesting that the borderline result for bowel-loop dilatation in the primary analysis may reflect random variability arising from the small subgroup sizes and within-patient correlation rather than absence of a true underlying association.
Methodologically, evaluation of the contrast wash-out parameter in our protocol relied on a four-level qualitative time-based ordinal scale (∼30 s; ∼1 min; ∼2 min; >2 min), rather than on quantitative time-intensity curve analysis with dedicated perfusion software. The qualitative approach reflects clinical applicability in a real-world post-transplant setting but has lower potential reproducibility than quantitative analysis.
Several prespecified sensitivity analyses supported the robustness of the primary findings. In the within-patient sensitivity analysis, the Spearman rho was substantially similar to the primary result (rho = 0.696 vs 0.684), suggesting that the result was not driven by the within-patient correlation introduced by the 10 patients with two episodes each. In the analysis stratified by concurrent gastrointestinal infection — a relevant concern given that Clostridioides difficile and other enteric infections produce bowel-wall changes that overlap with the CEUS features scored in the score — the rho values were comparable in the uninfected (0.631) and infected (0.708), with widely overlapping confidence intervals; CEUS scores were numerically higher in infected episodes, but the difference was not statistically significant. Concerning concurrent CMV viral reactivation, the median serum viral load in PCR-positive episodes was low (1,500 IU/mL), consistent with subclinical reactivation rather than active CMV enteric disease, and none of these episodes had biopsy-proven CMV cytopathic inclusions; episodes with histopathologically confirmed CMV colitis superimposed on GI GVHD had been excluded prospectively. In the analysis stratified by GVHD form, the rho was nearly identical in the acute (0.682, n = 37) and chronic (0.692, n = 14) subgroups, supporting the interpretation that the CEUS-Lerner association reflects mucosal injury severity captured by the Lerner system in both forms in our cohort, even if the underlying pathophysiological mechanisms differ.
In an exploratory secondary analysis, the total CEUS-GVHD score showed discriminative performance for distinguishing severe (Lerner III–IV) from mild (Lerner I–II) GI GVHD that was internally consistent across the primary and within-patient sensitivity analyses (AUC 0.853 in both, with overlapping confidence intervals and an identical Youden-optimal cut-off of >8 points). The point estimate falls within the range of "good" discrimination commonly used in the literature, but the wide confidence intervals reflect the small subgroup sizes (n = 21 severe in the primary analysis; n = 17 in the sensitivity analysis) and preclude precise estimation of operating characteristics. We emphasise that this analysis is hypothesis-generating, not confirmatory: it cannot be used to recommend the score as a stand-alone severity-grading tool, and external validation in larger multicentre cohorts is required before any clinical application.
The cohort presented frequent thrombocytopenia, neutropenia, and concurrent gastrointestinal infection. Forty-three percent of episodes had platelet counts below the institutional threshold for full colonoscopy, and 33% had documented concurrent gastrointestinal infection. Although a non-invasive imaging modality with bedside availability and no procedural bleeding risk has potential utility in this clinical context, our study did not directly compare CEUS-guided management with endoscopy-guided management or measure clinical outcomes attributable to the use of CEUS. Whether CEUS provides clinically actionable information that improves patient outcomes in this population requires testing in studies designed for that purpose.
This study has several limitations. For this cohort the analytic test was restricted to histopathologically confirmed GI GVHD episodes; the study cannot estimate the diagnostic accuracy of CEUS for distinguishing GVHD from infectious, drug-related, vascular, or other post-transplant gastrointestinal complications. No prespecified sample size calculation was performed. The post-hoc precision around the primary Spearman correlation (95% CI width 0.304 for rho = 0.684) is consistent with adequate characterisation of moderate-to-strong associations but limits the precision for smaller effect sizes, including stratified subgroups. The data are clustered at the patient level (10 of 41 patients contributed two episodes each). The four-level ordinal scale for the contrast wash-out parameter (∼30 s; ∼1 min; ∼2 min; >2 min) was established empirically by the imaging team. Individual parameter scores were recorded in real time by a single ultrasonographer blinded to histopathology, but cine-loops were available only for a subset of examinations, precluding formal blinded re-scoring and intra-observer reproducibility analysis. Histopathological evaluation was performed by a single pathologist per specimen from the institutional team, without formal adjudication or inter-observer agreement analysis. Future studies with central blinded re-reading by multiple pathologists would strengthen reproducibility.
Although Bonferroni correction was applied to the five parallel Spearman correlations and four remained robust, the bowel-loop dilatation correlation did not retain Bonferroni-adjusted significance and should be interpreted as hypothesis-generating.
Small infected stratum (n = 17) limits the precision of the infection-stratified estimate, and CEUS cannot directly distinguish the relative contribution of GVHD and concurrent infection in individual cases. Chronic GI GVHD subgroup is small (n = 14), and the Lerner system was developed principally for acute GI GVHD; dedicated investigation in larger chronic cohorts is required.
The timing between CEUS and biopsy was not collected as a prospective study variable and could not be quantified retrospectively. Future cohorts should record this interval and any concurrent treatment changes to allow formal sensitivity analyses.
In this prospective single-centre cohort of 51 episodes with histopathologically confirmed GI GVHD, a four-parameter CEUS-GVHD score showed a moderate-to-strong correlation with the Lerner histopathological grade, with progressive separation of median scores from grade I to grade IV and consistent results across within-patient, infection-stratified, and form-stratified sensitivity analyses. The score may reflect histopathological severity at the group level, but the overlap between adjacent Lerner grades, the within-patient clustering, and the single-operator, single-centre design preclude conclusions about individual-level severity grading or substitution of histopathological assessment.
The findings are exploratory and hypothesis-generating. External validation in multicentre cohorts with multiple operators, prospective recording of the timing between CEUS and biopsy, and formal inter-observer reproducibility analysis are required before any role for CEUS in the routine assessment of GI GVHD severity can be established.