Table 1
Type of validity evidence, and how each type of validity evidence was translated into a student-relevant definition, and individual survey items
|
Evidence of validity |
Definition as per Cook & Beckman 2006 [27] |
Proposed student relevant definition |
Individual items |
|---|---|---|---|
|
Content |
Comprises a description of steps taken to ensure that assessment content (including scenarios, questions, response options, and instructions) reflects the construct it is intended to measure (e. g., ‘professionalism’). This might involve basing the assessment on prior instruments, obtaining expert review, or using an assessment blueprint |
Students feel the exam meets their expectations in terms of content, level of difficulty, breadth of topics covered, and alignment with curricular objectives |
1. The breadth of material covered on the R&E cumulative exam was appropriate |
|
2. The content of the R&E cumulative exam reflects the learning objectives of the R&E week | |||
|
3. The R&E cumulative exam was at the level of difficulty that I expected | |||
|
4. The R&E cumulative exam questions were appropriately weighted across all blocks | |||
|
5. The R&E cumulative exam was fair | |||
|
Response process |
Comprises theoretical and empirical analyses evaluating how well rater or examinee actions (responses) align with the intended construct. This includes assessment security (those who cheat are not responding based on the intended construct), quality control, and analysis of examinees’ or raters’ thoughts or actions during the assessment activity |
Students feel the exam administration and scoring process is fair, and that there are appropriate quality control measures in place (e. g., monitoring, consequences for cheating behaviour) that allow for an appropriate assessment of their mastery of the material |
6. The R&E cumulative exam invigilation was effectively performed during the exam |
|
7. The R&E cumulative exam was administered in a way that allows true reflection of individual student mastery of the required material | |||
|
8. If a student were to act dishonestly during the R&E cumulative exam (e. g., cheating) they would be caught | |||
|
9. There is an appropriate process in place to address students who behave dishonestly (e. g., cheating) | |||
|
Internal structure |
Comprises data evaluating the relations among individual assessment items and how these relate to the overarching construct. This most often takes the form of measures of reproducibility (reliability) across items, stations, or raters, but can also include item analysis (item difficulty and item discrimination) and factor analysis |
Students feel that the range of item difficulty and discrimination is appropriate and that the exam is reliable; therefore, they are comfortable with the interpretation of the scores |
10. The questions on the R&E cumulative exam allow for differentiating between students who master the content and students who do not |
|
11. The range of difficulty of questions on the R&E cumulative exam appropriately reflects the diversity of experiences encountered in a clinical setting | |||
|
12. The R&E cumulative exam results are a fair portrayal of what I believe my level of clinical knowledge to be | |||
|
13. My performance on R&E cumulative exams is consistent across exams | |||
|
Relationship to other variable |
Regards the statistical associations between assessment scores and another measure or feature that has a specified theoretical relationship. This relationship might be strongly positive (e. g., two measures that should measure the same construct) or negligible (for measures that should be independent) |
Students feel that the exam is aligned with clinical scenarios, and builds on the foundation set by the block exams |
14. My performance on the R&E cumulative exam reflects my performance in a clinical setting |
|
15. My R&E cumulative exam scores are a more appropriate representation of my level of mastery than my block exam scores | |||
|
16. The block exams provide me with adequate foundational knowledge to succeed in the R&E cumulative exam | |||
|
17. There is a disconnect in my performance on the block exams the R&E cumulative exama | |||
|
18. My performance on the R&E cumulative exam gives me confidence in my ability to perform in the clinical setting | |||
|
Consequential |
Regards the impact, beneficial or harmful, of the assessment itself and the decisions and actions that result (e. g., remediation following sub-standard performance). This also includes factors that directly influence the rigor of such decisions, such as the definition of the passing score (e. g., at what point is remediation required?) and differences in scores among subgroups where performance ought to be similar (suggesting that decisions may be spurious) |
Students perceive the exam as having more positive consequences (e. g., promotes learning and reflection) than negative consequences (e. g., failing a student that has mastered the content), and that there is a consideration for social consequences |
19. The R&E cumulative exam helps to prepare me for work in a clinical setting |
|
20. The R&E cumulative exams are appropriate checkpoints before entering a clinical setting | |||
|
21. The R&E cumulative exam causes me more anxiety than the block final exama | |||
|
22. The students who fail the R&E week exams are those who did not master the exam content | |||
|
23. The learning experience of the R&E week exam is an overall positive learning experience | |||
|
24. The procedures in place for a failed R&E cumulative exam will be beneficial to my development |
Note: R&E refers to Reflection and Evaluation cumulative exams
aIndicates items that were reverse coded
Table 2
Mean ratings per evidence of validity across year, rank ordered from highest mean rating to lowest
|
Evidence of validity |
Mean (SD) ratings for year 1 students |
Mean (SD) ratings for year 2 students |
|---|---|---|
|
Response process |
4.8 (0.7)abc |
4.9 (0.8)a |
|
Content |
4.2 (0.9) |
4.8 (0.8)b |
|
Consequential |
4.1 (0.9)a |
4.6 (0.7)c |
|
Internal structure |
4.0 (0.9)b |
4.3 (0.9)ab |
|
Relation to other variables |
3.7 (1.1)c |
4.1 (0.9)abc |
Same letters (a, b, c) indicate that means are significantly different from each other (p’s < 0.05)
Fig. 1
Mean ratings for each validity evidence type, by level of student. Mean is indicated by +, error bars represent range
