Table 1
Scenario-level reliability estimates (ICC) for the single-station OSCE-format interview assessment.
| SCENARIO | DOMAIN | RATER | MEAN (SD) | MEDIAN (RANGE) | SD%max | ICC (95% CI) |
|---|---|---|---|---|---|---|
| 1 (n = 20) | Attitude (0–8) | A | 7.65 (0.59) | 8 (6–8) | 7.4% | 0.892 (0.750–0.956) |
| B | 7.55 (0.76) | 8 (5–8) | 9.5% | |||
| 1 | Skills (0–18) | A | 16.25 (1.45) | 16.5 (11–18) | 8.1% | 0.510 (0.095–0.773) |
| B | 16.45 (1.57) | 17 (14–18) | 8.7% | |||
| 1 | Evaluation (0–6) | A | 5.10 (1.25) | 6 (2–6) | 20.8% | 0.857 (0.657–0.942) |
| B | 4.80 (1.36) | 5 (2–6) | 22.7% | |||
| 1 | Total (0–32) | A | 29.00 (2.43) | 30 (24–32) | 7.6% | 0.761 (0.487–0.898) |
| B | 28.85 (2.39) | 29 (22–32) | 7.5% | |||
| 2 (n = 20) | Attitude (0–8) | A | 7.80 (0.41) | 8 (7–8) | 5.1% | 0.584 (0.199–0.812) |
| B | 7.75 (0.44) | 8 (7–8) | 5.5% | |||
| 2 | Skills (0–18) | A | 16.60 (1.43) | 17 (13–18) | 7.9% | 0.825 (0.609–0.927) |
| B | 16.65 (1.31) | 17 (14–18) | 7.3% | |||
| 2 | Evaluation (0–6) | A | 5.30 (1.26) | 6 (2–6) | 21.0% | 0.794 (0.510–0.910) |
| B | 4.90 (1.29) | 5 (2–6) | 21.5% | |||
| 2 | Total (0–32) | A | 29.70 (2.00) | 30 (25–32) | 7.6% | 0.855 (0.670–0.940) |
| B | 29.30 (2.20) | 29.5 (25–32) | 7.5% | |||
| 3 (n = 20) | Attitude (0–8) | A | 7.65 (0.67) | 8 (6–8) | 8.4% | 0.691 (0.376–0.864) |
| B | 7.75 (0.44) | 8 (7–8) | 5.5% | |||
| 3 | Skills (0–18) | A | 16.45 (1.05) | 16 (15–18) | 5.8% | 0.719(0.415–0.879) |
| B | 16.50 (1.19) | 17 (14–18) | 6.6% | |||
| 3 | Evaluation (0–6) | A | 5.05 (1.85) | 6 (0–6) | 30.8% | 0.979 (0.944–0.992) |
| B | 4.90 (1.89) | 6 (0–6) | 31.5% | |||
| 3 | Total (0–32) | A | 29.15 (2.56) | 30 (24–32) | 8.0% | 0.929 (0.828–0.971) |
| B | 29.15 (2.73) | 29 (23–32) | 8.5% |
[i] Abbreviations: ICC, intraclass correlation coefficient; CI, confidence interval; SD, standard deviation. SD%max was calculated as SD divided by the maximum possible score for the domain and expressed as a percentage. ICCs were calculated using a two-way random-effects model with absolute agreement; single-measure ICCs are reported. ICC reflects agreement between Rater A and Rater B and is therefore shown once per domain.

Figure 1
Bland–Altman plot of inter-rater agreement for total checklist/rubric scores (n = 60).
Note. The solid line indicates the mean difference between raters (0.18), and the dashed lines indicate the 95% limits of agreement (–2.35 to 2.71). X-axis: mean of the two raters’ total scores; Y-axis: score difference (Rater A – Rater B).
