Consistency of aberrant response behavior: Are misfit persons consistent across two different questionnaires administered at the same time?

Putra, Muhammad Dwirifqi Kharisma; Faturochman,

Consistency of aberrant response behavior: Are misfit persons consistent across two different questionnaires administered at the same time?

Romanian Journal of Applied Psychology

Volume 27 (2025): Issue 1 (January 2025)

By:

Muhammad Dwirifqi Kharisma Putra and Faturochman,

Open Access

|Aug 2025

References

Adams, R.J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21(1), 1-23. https://doi.org/10.1177/0146621697211001
Search in Google Scholar Back to article
Alnahdi, G. H., & Yada, A. (2020). Rasch analysis of the Japanese version of Teacher Efficacy for Inclusive Practices Scale: Scale unidimensionality. Frontiers in Psychology, 11: 1725. https://doi.org/10.3389/fpsyg.2020.01725
Search in Google Scholar Back to article
American Educational Research Association (AERA), American Psychological Association (APA), National Council for Measurement in Education (NCME). (2014). Standards for educational and psychological testing. American Educational Research Association.
Search in Google Scholar Back to article
André, Q. (2022). Outlier exclusion procedures must be blind to the researcher’s hypothesis. Journal of Experimental Psychology: General, 151(1), 213–223. https://doi.org/10.1037/xge0001069
Search in Google Scholar Back to article
Andrich, D., & Marais, I. (2014). Person proficiency estimates in the dichotomous rasch model when random guessing is removed from difficulty estimates of multiple choice items. Applied Psychological Measurement, 38(6), 432-449. https://doi.org/10.1177/0146621614529646
Search in Google Scholar Back to article
Andrich, D., Marais, I., & Humphry, S. (2016). Controlling guessing bias in the dichotomous Rasch model applied to a large-scale, vertically scaled testing program. Educational and Psychological Measurement, 76(3), 412-435. https://doi.org/10.1177/0013164415594202
Search in Google Scholar Back to article
Artner, R. (2016). A simulation study of person-fit in the Rasch model. Psychological Test and Assessment Modeling, 58(3), 531–563.
Search in Google Scholar Back to article
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd ed.). Lawrence Erlbaum Associates.
Search in Google Scholar Back to article
Briz-Redon, A. (2021). Respondent burden effects on item non-response and careless response rates: An analysis of two types of surveys. Mathematics, 9(17), 2035. https://doi.org/10.3390/math9172035
Search in Google Scholar Back to article
Burchell, B., & Marsh, C. (1992). The effect of questionnaire length on survey response. Quality and Quantity, 26(3), 233-244. https://doi.org/10.1007/BF00172427
Search in Google Scholar Back to article
Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29. https://doi.org/10.18637/jss.v048.i06
Search in Google Scholar Back to article
Conjin, J. M., Emons, W. H. M., & Sijtsma, K. (2014). Statistic lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38(2), 122-136. https://doi.org/10.1177/0146621613497568
Search in Google Scholar Back to article
Crişan, D. R., Tendeiro, J. N., & Meijer, R. R. (2017). Investigating the practical consequences of model misfit in unidimensional IRT models. Applied Psychological Measurement, 41(6), 439–455. https://doi.org/10.1177/0146621617695522
Search in Google Scholar Back to article
Curtis, D. D. (2001). Misfits: People and their problems. What might it all mean? International Education Journal, 2(4), 91-99.
Search in Google Scholar Back to article
Curtis, D. D. (2004). Person misfit in attitude surveys: Influences, impacts and implications. International Education Journal, 5(2), 125-144.
Search in Google Scholar Back to article
Drasgow, F., Levine, M. V., & William, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67-86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x
Search in Google Scholar Back to article
Du, J., Wang, Y., Wu, A., Jiang, Y., Duan, Y., Geng, W., Wan, L., Li, J., Hu, J., Jiang, J., Shi, L., & Wei, J. (2024). The validity and IRT psychometric analysis of Chinese version of Difficult Doctor-Patient Relationship Questionnaire (DDPRQ-10). BMC Psychiatry, 23: 900. https://doi.org/10.1186/s12888-023-05385-5
Search in Google Scholar Back to article
Egberink, I. J. L., Meijer, R. R., Veldkamp, B. P., Schakel, L., & Smid, N. G. (2010). Detection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM. Personality and Individual Differences, 48(8), 921-925. https://doi.org/10.1016/j.paid.2010.02.023
Search in Google Scholar Back to article
Emons, M. H. W., Sijtsma, K., & Meijer, R. R. (2005). Global, local, and graphical person fit analysis using person-response functions. Psychological Methods, 10(1), 101-119. https://doi.org/10.1037/1082-989X.10.1.101
Search in Google Scholar Back to article
Felt, J. M., Castaneda, R., Tiemensma, J., & Depaoli, S. (2017). Using person fit statistics to detect outliers in survey research. Frontiers in Psychology, 8: 863. https://doi.org/10.3389/fpsyg.2017.00863
Search in Google Scholar Back to article
Ferrando, P. J. (2015). Assessing person fit in typicalresponse measures. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 128–155). Routledge/Taylor & Francis Group.
Search in Google Scholar Back to article
Ferrando, P. J., Vigil-Colet, A., & Lorenzo-Seva, U. (2016). Practical person-fit assessment with the linear FA model: New developments and a comparative study. Frontiers in Psychology, 7: 1973. https://doi.org/10.3389/fpsyg.2016.01973
Search in Google Scholar Back to article
Haberman, S. J., Sinharay, S., & Chon, K. H. (2013). Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions. Psychometrika, 78(3), 417–440. https://doi.org/10.1007/s11336-012-9305-1
Search in Google Scholar Back to article
Hayat, B., Rahayu, W., Putra, M. D. K., Sarifah, I., Puri, V. G. S., & Isa, K. (2023). Metacognitive Skills Assessment in Research-Proposal Writing (MSARPW) in the Indonesian university context: Scale development and validation using multidimensional item response models. Jurnal Pengukuran Psikologi dan Pendidikan Indonesia, 12(1), 31-47. https://doi.org/10.15408/jp3i.v12i1.31679
Search in Google Scholar Back to article
Hong, S. E., Monroe, S., & Falk, C. F. (2020). Performance of person-fit statistics under model misspecification. Journal of Educational Measurement, 57(3), 423-442. https://doi.org/10.1111/jedm.12207
Search in Google Scholar Back to article
International Test Commission (ITC). (2014). ITC guidelines on quality control in scoring, test analysis, and reporting of test scores. International Journal of Testing, 14(3), 195-217. https://doi.org/10.1080/15305058.2014.918040
Search in Google Scholar Back to article
Jones, E. A., Wind, S. A., Tsai, C-L., & Ge, Y. (2023). Comparing person-fit and traditional indices across careless response patterns in surveys. Applied Psychological Measurement, 47(5-6), 365-385. https://doi.org/10.1177/01466216231194358
Search in Google Scholar Back to article
Karabatsos, G. (1998). Analyzing nonadditive conjoint structures: Compounding events by Rasch model probabilities. Journal of Outcome Measurement, 2(3), 191-221.
Search in Google Scholar Back to article
Karabatsos, G. (2000). A critique of rasch residual fit statistics. Journal of Applied Measurement, 1(2), 152-176.
Search in Google Scholar Back to article
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277-298. https://doi.org/10.1207/S15324818AME1604_2
Search in Google Scholar Back to article
Levine, M. V., & Rubin, D. B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4(4), 269–290. https://doi.org/10.2307/1164595
Search in Google Scholar Back to article
Linacre, J. M. (2005). When to stop removing items and persons in Rasch analysis? Rasch Measurement Transactions, 23(4), 1241.
Search in Google Scholar Back to article
Li, M.-n. F., & Olejnik, S. (1997). The power of Rasch person–fit statistics in detecting unusual response patterns. Applied Psychological Measurement, 21(3), 215–231. https://doi.org/10.1177/01466216970213002
Search in Google Scholar Back to article
Liu, Y., & Maydeu Olivares, A. (2014). Identifying the source of misfit in item response theory models. Multivariate Behavioral Research, 49(4), 354-371. https://doi.org/10.1080/00273171.2014.910744
Search in Google Scholar Back to article
Liu, Y., & Liu, H. (2021). Detecting noneffortful responses based on a residual method using an iterative purification process. Journal of Educational and Behavioral Statistics, 46(6), 717-752. https://doi.org/10.3102/1076998621994366
Search in Google Scholar Back to article
Liu, T., Lan, T., & Xin, T. (2019a). Detecting random responses in a personality scale using IRT-based personfit indices. European Journal of Psychological Assessment, 35(1), 126-136. https://doi.org/10.1027/1015-5759/a000369
Search in Google Scholar Back to article
Liu, T., Sun, Y., Li, Z., & Xin, T. (2019b). The impact of aberrant response on reliability and validity. Measurement: Interdisciplinary Research and Perspectives, 17(3), 133-142. https://doi.org/10.1080/15366367.2019.1584848
Search in Google Scholar Back to article
Lundgren, E., & Eklof, H. (2023). Questionnaire-taking motivation: Using response times to assess motivation to optimize on the PISA 2018 student questionnaire. International Journal of Testing, 23(4), 231-256. https://doi.org/10.1080/15305058.2023.2214647
Search in Google Scholar Back to article
Maroqi, N. (2018). Uji validitas konstruk pada instrumen Rosenberg Self-Esteem Scale dengan metode confirmatory factor analysis (CFA). Jurnal Pengukuran Psikologi dan Pendidikan Indonesia, 7(2), 92-96. https://doi.org/10.15408/jp3i.v7i2.12101
Search in Google Scholar Back to article
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/BF02296272
Search in Google Scholar Back to article
Maydeu-Olivares, A. (2013). What should we assess the goodness of fit of IRT models? Measurement, 11(3), 127-137. https://doi.org/10.1080/15366367.2013.841511
Search in Google Scholar Back to article
Meijer, R., R. (1996). Person fit research: An introduction. Applied Measurement In Education, 9(1), 3-8. https://doi.org/10.1207/s15324818ame0901_2
Search in Google Scholar Back to article
Meijer, R. R. (2003). Diagnosing item score patterns on a test using item response theory-based person-fit statistics. Psychological Methods, 8(1), 72–87. https://doi.org/10.1037/1082-989X.8.1.72
Search in Google Scholar Back to article
Meijer, R. R., & Sijtsma, K. (1995). Detection of aberrant item score patterns: A review of recent developments. Applied Measurement in Education, 8(3), 261–272. https://doi.org/10.1207/s15324818ame0803_5
Search in Google Scholar Back to article
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135. https://doi.org/10.1177/01466210122031957
Search in Google Scholar Back to article
Meijer, R. R., & Tendeiro, J. N. (2012). The use of the lz and lz* person-fit statistics and problems derived from model misspecification. Journal of Educational and Behavioral Statistics, 37(6), 758-766. https://doi.org/10.3102/1076998612466144
Search in Google Scholar Back to article
Meijer, R. R., Niessen, M. S. A., & Tendeiro, N. J. (2016). A practical guide to check the consistency of item response patterns in clinical research through person fit statistics: examples and a computer program. Assessment, 23(1), 56-62. https://doi.org/10.1177/1073191115577800
Search in Google Scholar Back to article
Moshagen, M., & Bader, M. (2024). semPower: General power analysis for structural equation models. Behavior Research Methods, 56(4), 2901-2922. https://doi.org/10.3758/s13428-023-02254-7
Search in Google Scholar Back to article
Ogihara, Y., & Kusumi, T. (2020). The developmental trajectory of self-esteem across the life span in Japan: Age differences in scores on the Rosenberg Self-Esteem Scale from adolescence to old age. Frontiers in Public Health, 8: 132. https://doi.org/10.3389/fpubh.2020.00132
Search in Google Scholar Back to article
Olson J. F., & Fremer J. (2013). TILSA Test security guidebook: Preventing, detecting, and investigating test security irregularities. Council of Chief State School Officers.
Search in Google Scholar Back to article
Panayides, P., & Tymms, P. (2012). Is aberrant response behavior a stable characteristic of students in classroom math tests? Rasch Measurement Transactions, 26(3), 1382-1383.
Search in Google Scholar Back to article
Panayides, P., & Tymms, P. (2013). Investigating whether aberrant response behaviour in classroom maths tests is a stable characteristic of students. Assessment in Education: Principles, Policy & Practice, 20(3), 349-368. https://doi.org/10.1080/0969594x.2012.723610
Search in Google Scholar Back to article
Pina, J. A. L., & Montesinos, M. D. H. (2005). Fitting Rasch model using appropriateness measure statistics. The Spanish Journal of Psychology, 8(1), 100-110. https://doi.org/10.1017/S113874160000500X
Search in Google Scholar Back to article
R Core Team. (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Search in Google Scholar Back to article
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research.
Search in Google Scholar Back to article
Reise, S. P., & Flannery, W. P. (1996). Assessing person-fit on measures of typical performance. Applied Measurement in Education, 9(1), 9–26. https://doi.org/10.1207/s15324818ame0901_3
Search in Google Scholar Back to article
Rolstad, S., Adler, J., & Ryden, A. (2011). Response burden and questionnaire length: Is shorter better? A review and meta-analysis. Value in Health, 14(8), 1101-1108. https://doi.org/10.1016/j.jval.2011.06.003
Search in Google Scholar Back to article
Rosenberg, M. (1965). Rosenberg Self-Esteem Scale (RSES) [Database record]. APA PsycTests. https://doi.org/10.1037/t01038-000
Search in Google Scholar Back to article
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02
Search in Google Scholar Back to article
Sijtsma, K., & Meijer, R. R. (2001). The person response function as a tool in person-fit research. Psychometrika, 66(2), 191–207. https://doi.org/10.1007/BF02294835
Search in Google Scholar Back to article
Smith, R. M. (1986). Person fit in the Rasch model. Educational and Psychological Measurement, 46(2), 359–372. https://doi.org/10.1177/001316448604600210
Search in Google Scholar Back to article
Spoden, C., Fleischer, J., & Frey, A. (2020). Person misfit, test anxiety, and test-taking motivation in a large-scale mathematics proficiency test for self-evaluation. Studies in Educational Evaluation, 67: 100910. https://doi.org/10.1016/j.stueduc.2020.100910
Search in Google Scholar Back to article
Tay, L., Meade, A. W., & Cao, M. (2015). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods, 18(1), 3-46. https://doi.org/10.1177/1094428114553062
Search in Google Scholar Back to article
Tesio, L., Caronni, A., Kumbhare, D., & Scarano, S. (2024a). Interpreting results from Rasch analysis 1. The “most likely” measures coming from the model. Disability and Rehabilitation, 46(3), 591–603. https://doi.org/10.1080/09638288.2023.2169771
Search in Google Scholar Back to article
Tesio, L., Caronni, A., Simone, A., Kumbhare, D., & Scarano, S. (2024b). Interpreting results from Rasch analysis 2. Advanced model applications and the data-model fit assessment. Disability and Rehabilitation, 46(3), 604–617. https://doi.org/10.1080/09638288.2023.2169772
Search in Google Scholar Back to article
Turner, K. T., & Engelhard, G., Jr. (2024). Using functional clustering to diagnose person misfit. Journal of Experimental Education, 92(2), 377–397. https://doi.org/10.1080/00220973.2022.2161088
Search in Google Scholar Back to article
van der Linden, W. J., & van Krimpen-Stoop, E. M. L. A. (2003). Using response times to detect aberrant responses in computerized adaptive testing. Psychometrika, 68(2), 251–265. https://doi.org/10.1007/BF02294800
Search in Google Scholar Back to article
Wanders, R. B. K., Meijer, R. R., Ruhé, H. G., Sytema, S., Wardenaar, K. J., & de Jonge, P. (2018). Person-fit feedback on inconsistent symptom reports in clinical depression care. Psychological Medicine, 48(11), 1844-1852. https://doi.org/10.1017/S003329171700335X
Search in Google Scholar Back to article
Wang, W.-C., Chen, P.-H., & Cheng, Y.-Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9(1), 116–136. https://doi.org/10.1037/1082-989X.9.1.116
Search in Google Scholar Back to article
Wind, A. S., & Schumacker, E. R. (2017). Detecting measurement disturbances in rater mediated assessments. Educational Measurement: Issues and Practice, 36(4), 44-51. https://doi.org/10.1111/emip.12164
Search in Google Scholar Back to article
Wright, B. D., & Stone, M. (1999). Measurement essentials (2nd ed.). Wide Range, Inc.
Search in Google Scholar Back to article
Yekutieli, D., & Benjamini, Y. (1999). Resampling-based
Search in Google Scholar Back to article
false discovery rate controlling multiple test procedures for correlated test statistics. Journal of Statistical Planning and Inference, 82(1-2), 171-196. https://doi.org/10.1016/S0378-3758(99)00041-5
Search in Google Scholar Back to article
Zahra, N. S., & Wirawan, H. (2024). Empowering digital transformation: Developing and validating a Digital Leadership Scale through Rasch model analysis. Measurement: Interdisciplinary Research and Perspectives. Advanced online publication. https://doi.org/10.1080/15366367.2024.2334591
Search in Google Scholar Back to article
Zou, D., & Bolt, D. M. (2023). Person misfit and person reliability in Rating Scale Measures: The role of response styles. Measurement: Interdisciplinary Research and Perspectives, 21(3), 167-180. https://doi.org/10.1080/15366367.2022.2114243
Search in Google Scholar Back to article