Detecting rater bias using a&nbsp;person-fit statistic: a&nbsp;Monte Carlo simulation study

André-Sébastien Aubin; Christina St-Onge; Jean-Sébastien Renaud

doi:10.1007/S40037-017-0391-8

References

Frank J Snell LS Cate OT Competency-based medical education: theory to practice Med Teach 2010 32 638 645 10.3109/0142159X.2010.501190
Open DOI Search in Google Scholar Back to article
Berendonk C Stalmeijer RE Schuwirth LWT Expertise in performance assessment: assessors’ perspectives Adv. Health. Sci. Educ. Theory. Pract. 2013 18 559 571 10.1007/s10459-012-9392-x
Open DOI Search in Google Scholar Back to article
Holmboe ES Sherbino J Long DM Swing SR Frank JR The role of assessment in competency-based medical education Med Teach 2010 32 676 682 10.3109/0142159X.2010.500704
Open DOI Search in Google Scholar Back to article
Govaerts MJB Schuwirth LWT van der Vleuten CPM Muijtjens AMM Workplace-based assessment: effects of rater expertise Adv. Health. Sci. Educ. Theory. Pract. 2011 16 151 165 10.1007/s10459-010-9250-7
Open DOI Search in Google Scholar Back to article
Govaerts MJB Van de Wiel MWJ Schuwirth LWT Van der Vleuten CPM Muijtjens AMM Workplace-based assessment: raters’ performance theories and constructs Adv. Health. Sci. Educ. Theory. Pract. 2013 18 375 396 10.1007/s10459-012-9376-x
Open DOI Search in Google Scholar Back to article
Gauthier G St-Onge C Tavares W Rater cognition: Review and integration of research findings Med Educ 2016 50 511 522 10.1111/medu.12973
Open DOI Search in Google Scholar Back to article
Gingerich A Regehr G Eva KW Rater-based assessments as social judgments: rethinking the etiology of rater errors Acad Med 2011 86 S1 S7 10.1097/ACM.0b013e31822a6cf8
Open DOI Search in Google Scholar Back to article
Govaerts MJB van der Vleuten CPM Schuwirth LWT Muijtjens AMM Broadening perspectives on clinical performance assessment: Rethinking the nature of in-training assessment Adv. Health. Sci. Educ. 2007 12 239 260 10.1007/s10459-006-9043-1
Open DOI Search in Google Scholar Back to article
St-Onge C Chamberland M Lévesque A Varpio L The role of the assessor: exploring the clinical supervisor’s skill set Clin Teach 2014 11 209 213 10.1111/tct.12126
Open DOI Search in Google Scholar Back to article
Gallagher P The role of the assessor in the assessment of practice: an alternative view Med Teach 2010 32 E413 E416 10.3109/0142159X.2010.496010
Open DOI Search in Google Scholar Back to article
Ginsburg S McIlroy J Oulanova O Eva K Regehr G Toward authentic clinical evaluation: pitfalls in the pursuit of competency Acad Med 2010 85 780 786 10.1097/ACM.0b013e3181d73fb6
Open DOI Search in Google Scholar Back to article
Smith EV Kulikowich JM An application of generalizability theory and many-faceted Rasch measurement using a complex problem-solving skills assessment Educ Psychol Meas 2004 64 617 639 10.1177/0013164404263876
Open DOI Search in Google Scholar Back to article
Hogan EA Effects of prior expectations on performance ratings: a longitudinal study Acad. Manage. J. 1987 30 354 368 10.2307/256279
Open DOI Search in Google Scholar Back to article
Nickerson RS Confirmation bias: a ubiquitous phenomenon in many guises Rev Gen Psychol 1998 2 175 220 10.1037/1089-2680.2.2.175
Open DOI Search in Google Scholar Back to article
Tversky A Kahneman D Judgement under uncertainty: heuristics and biases Science 1974 185 1124 1131 10.1126/science.185.4157.1124
Open DOI Search in Google Scholar Back to article
Yeates P O’Neill P Mann K Eva KW Effect of exposure to good vs poor medical trainee performance on attending physician rating of subsequent performances JAMA 2012 308 2226 2232 10.1001/jama.2012.36515
Open DOI Search in Google Scholar Back to article
Norcini J Burch V Workplace-based assessment as an educational tool: AMEE Guide No. 31 Med Teach 2007 29 855 871 10.1080/01421590701775453
Open DOI Search in Google Scholar Back to article
Downing SM Haladyna TM Assessment in health professions education 2009 New York Routledge 44 49
Search in Google Scholar Back to article
Chambers DW Do repeat clinical competency ratings stereotype students? J Dent Educ 2004 68 1220 1227
Search in Google Scholar Back to article
Judge TA Ferris GR Social context of performance evaluation decisions Acad. Manage. J. 1993 36 80 105 10.2307/256513
Open DOI Search in Google Scholar Back to article
Turban DB Jones AP Supervisor-subordinate similarity: types, effects, and mechanisms J Appl Psychol 1988 73 228 234 10.1037/0021-9010.73.2.228
Open DOI Search in Google Scholar Back to article
Waldman DA Avolio BJ Race effects in performance evaluation: controlling for ability, education and experience J Appl Psychol 1991 76 897 901 10.1037/0021-9010.76.6.897
Open DOI Search in Google Scholar Back to article
Downing SM Haladyna TM Validity threats: overcoming interference with proposed interpretations of assessment data Med Educ 2004 38 327 333 10.1046/j.1365-2923.2004.01777.x
Open DOI Search in Google Scholar Back to article
Roberts C Rothnie I Zoanetti N Crossley J Should candidate scores be adjusted for interviewer stringency or leniency in the multiple mini-interview? Med Educ 2010 44 690 698 10.1111/j.1365-2923.2010.03689.x
Open DOI Search in Google Scholar Back to article
Harasym PH Woloschuk W Cunning L Undesired variance due to examiner stringency/leniency effect in communication skill scores assessed in OSCEs Adv. Health. Sci. Educ. Theory. Pract. 2008 13 617 632 10.1007/s10459-007-9068-0
Open DOI Search in Google Scholar Back to article
Boulet JR Mckinley DW Whelan GP Hambleton RK Quality assurance methods for performance-based assessments Adv. Health. Sci. Educ. Theory. Pract. 2003 8 27 47 10.1023/A:1022639521218
Open DOI Search in Google Scholar Back to article
Iramaneerat C Yudkowsky R Myford CM Downing SM Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement Adv. Health. Sci. Educ. Theory. Pract. 2008 13 479 493 10.1007/s10459-007-9060-8
Open DOI Search in Google Scholar Back to article
McManus IC Thompson M Mollon J Assessment of examiner leniency and stringency (‘hawk-dove effect’) in the MRCP(UK) clinical examination (PACES) using multi-facet Rasch modelling BMC Med. Educ. 2006 6 42 10.1186/1472-6920-6-42
Open DOI Search in Google Scholar Back to article
Bartman I Smee S Roy M A method for identifying extreme OSCE examiners Clin Teach 2013 10 27 31 10.1111/j.1743-498X.2012.00607.x
Open DOI Search in Google Scholar Back to article
Prieto G Nieto E Analysis of rater severity on written expression exam using Many Faceted Rasch Measurement Psicologica 2014 35 385 97.
Search in Google Scholar Back to article
Raymond MR Viswesvaran C Least squares models to correct for rater effects in performance assessment J Educ Meas 1993 30 253 268 10.1111/j.1745-3984.1993.tb00426.x
Open DOI Search in Google Scholar Back to article
Meijer RR Sitsma K Person-fit statistic—what is their purpose Rasch Meas Trans 2001 15 823
Search in Google Scholar Back to article
Karabatsos G Comparing the aberrant response detection performance of thirty-six person-fit statistics Appl Meas Educ 2003 16 277 298 10.1207/S15324818AME1604_2
Open DOI Search in Google Scholar Back to article
Meijer RR Person-fit research: an introduction Appl Meas Educ 1996 9 3 8 10.1207/s15324818ame0901_2
Open DOI Search in Google Scholar Back to article
Rupp AA A systematic review of the methodology for person fit research in item response theory: lessons about generalizability of inferences from the design of simulation studies Psychol Test Assess Model 2013 55 3 38
Search in Google Scholar Back to article
Drasgow F Levine MV Williams EA Appropriateness measurement with polychotomous item response models and standardized indices Br J Math Stat Psychol 1985 38 67 86 10.1111/j.2044-8317.1985.tb00817.x
Open DOI Search in Google Scholar Back to article
St-Onge C Valois P Abdous B Germain S Person-fit statistics’ accuracy: a Monte Carlo study of the aberrance rate’s influence Appl. Psychol. Meas. 2011 35:419 32
Search in Google Scholar Back to article
Nering ML Meijer RR A comparison of the person response function and the lz person-fit statistic Appl Psychol Meas 1998 22 53 69 10.1177/01466216980221004
Open DOI Search in Google Scholar Back to article
Kinase S Mohammeadi A Takahashi M Application of Monte Carlo simulation and Voxel models to internal dosimetry Applications of Monte Carlo methods in biology, medicine and other fields of science 2011 Garching bei München InTech
Search in Google Scholar Back to article
Alexander C Monte Carlo VaR Market risk analysis 2009 Hoboken John Wiley & Sons 201 246
Search in Google Scholar Back to article
De Champlain AF A primer on classical test theory and item response theory for assessments in medical education Med Educ 2010 44 109 117 10.1111/j.1365-2923.2009.03425.x
Open DOI Search in Google Scholar Back to article
DeMars C Item response theory 2010 Oxford Oxford University Press 10.1093/acprof:oso/9780195377033.001.0001
Open DOI Search in Google Scholar Back to article
Bertrand R Blais JG Modèles de Mesure: L’Apport de la Théorie des Réponses aux Items 2004 Sainte-Foy Presses de l’Université du Québec
Search in Google Scholar Back to article
Osterlind SJ Modern measurement: theory, principles, and applications of mental appraisal 2006 Columbus Pearson Merrill Prentice Hall
Search in Google Scholar Back to article
Laurencelle L Germain S Les estimateurs de capacité dans la théorie des réponses aux items et leur biais Tutor Quant Methods Psychol 2011 7 42 53 10.20982/tqmp.07.2.p042
Open DOI Search in Google Scholar Back to article
Levine MV Rubin DB Measuring the appropriateness of multiple-choice test scores J Educ Behav Stat 1979 4 269 290 10.3102/10769986004004269
Open DOI Search in Google Scholar Back to article
Magis D Raiche G Beland S A didactic presentation of Snijders’s lz* index of person fit with emphasis on response model selection and ability estimation J Educ Behav Stat 2012 37 57 81 10.3102/1076998610396894
Open DOI Search in Google Scholar Back to article
Noonan BW Boss MW Gessaroli ME The effect of test length and IRT model on the distribution and stability of three appropriateness indexes Appl. Psychol. Meas. 1992 16 345 352 10.1177/014662169201600405
Open DOI Search in Google Scholar Back to article
Reise SP A comparison of item- and person-fit methods of assessing model-data fit in IRT Appl. Psychol. Meas. 1990 14 127 137 10.1177/014662169001400202
Open DOI Search in Google Scholar Back to article
Olejnik S Algina J Measures of effect size for comparative studies: applications, interpretations, and limitations Contemp Educ Psychol 2000 25 241 286 10.1006/ceps.2000.1040
Open DOI Search in Google Scholar Back to article
Cohen J Statistical power analysis for the behavioral sciences: a computer program 1988 Mahwah Lawrences Erlbaum Associates
Search in Google Scholar Back to article
St-Onge C Valois P Abdous B Germain S A Monte Carlo study of the effect of item characteristic curve estimation on the accuracy of three person-fit statistics Appl Psychol Meas 2009 33 307 324 10.1177/0146621608329503
Open DOI Search in Google Scholar Back to article
Team RC. R A language and environment for statistical computing R Foundation for Statistical Computing 2013 Vienna Team RC. R
Search in Google Scholar Back to article
Germain S Valois P Abdous B The item response theory library 2016
Search in Google Scholar Back to article
Govaerts MJB In-training assessment: learning from practice Clin Teach 2006 3 242 247 10.1111/j.1743-498X.2006.00119.x
Open DOI Search in Google Scholar Back to article
Williams RG Klamen DA McGaghie W Cognitive, social, and environmental sources of bias in clinical performance ratings Teach Learn Med 2003 15 270 292 10.1207/S15328015TLM1504_11
Open DOI Search in Google Scholar Back to article
Haladyna TM Downing SM Construct-irrelevant variance in high-stakes testing Educ Meas Issues Pract 2004 23 17 27 10.1111/j.1745-3992.2004.tb00149.x
Open DOI Search in Google Scholar Back to article
Drasgow F Levine MV McLaughlin ME Detecting inappropriate test scores with optimal and practical appropriateness indices Appl. Psychol. Meas. 1987 11 59 79 10.1177/014662168701100105
Open DOI Search in Google Scholar Back to article
Emons WHM Sijtsma K Meijer RR Testing hypotheses about the person-response function in person-fit analysis Multivariate Behav. Res. 2004 39 1 35 10.1207/s15327906mbr3901_1
Open DOI Search in Google Scholar Back to article
AERA, APA, NCME (American Educational Research Association & National Council on Measurement in Education) Joint Committee on Standards for Educational and Psychological Testing APA Standards for educational and psychological testing 1999 Washington, DC AERA
Search in Google Scholar Back to article

Detecting rater bias using a person-fit statistic: a Monte Carlo simulation study

References

Paradigm

My account