Alnahdi, G. H., & Yada, A. (2020). Rasch analysis of the Japanese version of Teacher Efficacy for Inclusive Practices Scale: Scale unidimensionality. Frontiers in Psychology, 11: 1725. https://doi.org/10.3389/fpsyg.2020.01725
American Educational Research Association (AERA), American Psychological Association (APA), National Council for Measurement in Education (NCME). (2014). Standards for educational and psychological testing. American Educational Research Association.
André, Q. (2022). Outlier exclusion procedures must be blind to the researcher’s hypothesis. Journal of Experimental Psychology: General, 151(1), 213–223. https://doi.org/10.1037/xge0001069
Andrich, D., & Marais, I. (2014). Person proficiency estimates in the dichotomous rasch model when random guessing is removed from difficulty estimates of multiple choice items. Applied Psychological Measurement, 38(6), 432-449. https://doi.org/10.1177/0146621614529646
Andrich, D., Marais, I., & Humphry, S. (2016). Controlling guessing bias in the dichotomous Rasch model applied to a large-scale, vertically scaled testing program. Educational and Psychological Measurement, 76(3), 412-435. https://doi.org/10.1177/0013164415594202
Briz-Redon, A. (2021). Respondent burden effects on item non-response and careless response rates: An analysis of two types of surveys. Mathematics, 9(17), 2035. https://doi.org/10.3390/math9172035
Burchell, B., & Marsh, C. (1992). The effect of questionnaire length on survey response. Quality and Quantity, 26(3), 233-244. https://doi.org/10.1007/BF00172427
Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29. https://doi.org/10.18637/jss.v048.i06
Conjin, J. M., Emons, W. H. M., & Sijtsma, K. (2014). Statistic lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38(2), 122-136. https://doi.org/10.1177/0146621613497568
Crişan, D. R., Tendeiro, J. N., & Meijer, R. R. (2017). Investigating the practical consequences of model misfit in unidimensional IRT models. Applied Psychological Measurement, 41(6), 439–455. https://doi.org/10.1177/0146621617695522
Drasgow, F., Levine, M. V., & William, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67-86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x
Du, J., Wang, Y., Wu, A., Jiang, Y., Duan, Y., Geng, W., Wan, L., Li, J., Hu, J., Jiang, J., Shi, L., & Wei, J. (2024). The validity and IRT psychometric analysis of Chinese version of Difficult Doctor-Patient Relationship Questionnaire (DDPRQ-10). BMC Psychiatry, 23: 900. https://doi.org/10.1186/s12888-023-05385-5
Egberink, I. J. L., Meijer, R. R., Veldkamp, B. P., Schakel, L., & Smid, N. G. (2010). Detection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM. Personality and Individual Differences, 48(8), 921-925. https://doi.org/10.1016/j.paid.2010.02.023
Emons, M. H. W., Sijtsma, K., & Meijer, R. R. (2005). Global, local, and graphical person fit analysis using person-response functions. Psychological Methods, 10(1), 101-119. https://doi.org/10.1037/1082-989X.10.1.101
Felt, J. M., Castaneda, R., Tiemensma, J., & Depaoli, S. (2017). Using person fit statistics to detect outliers in survey research. Frontiers in Psychology, 8: 863. https://doi.org/10.3389/fpsyg.2017.00863
Ferrando, P. J. (2015). Assessing person fit in typicalresponse measures. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 128–155). Routledge/Taylor & Francis Group.
Ferrando, P. J., Vigil-Colet, A., & Lorenzo-Seva, U. (2016). Practical person-fit assessment with the linear FA model: New developments and a comparative study. Frontiers in Psychology, 7: 1973. https://doi.org/10.3389/fpsyg.2016.01973
Haberman, S. J., Sinharay, S., & Chon, K. H. (2013). Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions. Psychometrika, 78(3), 417–440. https://doi.org/10.1007/s11336-012-9305-1
Hayat, B., Rahayu, W., Putra, M. D. K., Sarifah, I., Puri, V. G. S., & Isa, K. (2023). Metacognitive Skills Assessment in Research-Proposal Writing (MSARPW) in the Indonesian university context: Scale development and validation using multidimensional item response models. Jurnal Pengukuran Psikologi dan Pendidikan Indonesia, 12(1), 31-47. https://doi.org/10.15408/jp3i.v12i1.31679
Hong, S. E., Monroe, S., & Falk, C. F. (2020). Performance of person-fit statistics under model misspecification. Journal of Educational Measurement, 57(3), 423-442. https://doi.org/10.1111/jedm.12207
International Test Commission (ITC). (2014). ITC guidelines on quality control in scoring, test analysis, and reporting of test scores. International Journal of Testing, 14(3), 195-217. https://doi.org/10.1080/15305058.2014.918040
Jones, E. A., Wind, S. A., Tsai, C-L., & Ge, Y. (2023). Comparing person-fit and traditional indices across careless response patterns in surveys. Applied Psychological Measurement, 47(5-6), 365-385. https://doi.org/10.1177/01466216231194358
Karabatsos, G. (1998). Analyzing nonadditive conjoint structures: Compounding events by Rasch model probabilities. Journal of Outcome Measurement, 2(3), 191-221.
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277-298. https://doi.org/10.1207/S15324818AME1604_2
Levine, M. V., & Rubin, D. B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4(4), 269–290. https://doi.org/10.2307/1164595
Li, M.-n. F., & Olejnik, S. (1997). The power of Rasch person–fit statistics in detecting unusual response patterns. Applied Psychological Measurement, 21(3), 215–231. https://doi.org/10.1177/01466216970213002
Liu, Y., & Maydeu Olivares, A. (2014). Identifying the source of misfit in item response theory models. Multivariate Behavioral Research, 49(4), 354-371. https://doi.org/10.1080/00273171.2014.910744
Liu, Y., & Liu, H. (2021). Detecting noneffortful responses based on a residual method using an iterative purification process. Journal of Educational and Behavioral Statistics, 46(6), 717-752. https://doi.org/10.3102/1076998621994366
Liu, T., Lan, T., & Xin, T. (2019a). Detecting random responses in a personality scale using IRT-based personfit indices. European Journal of Psychological Assessment, 35(1), 126-136. https://doi.org/10.1027/1015-5759/a000369
Liu, T., Sun, Y., Li, Z., & Xin, T. (2019b). The impact of aberrant response on reliability and validity. Measurement: Interdisciplinary Research and Perspectives, 17(3), 133-142. https://doi.org/10.1080/15366367.2019.1584848
Lundgren, E., & Eklof, H. (2023). Questionnaire-taking motivation: Using response times to assess motivation to optimize on the PISA 2018 student questionnaire. International Journal of Testing, 23(4), 231-256. https://doi.org/10.1080/15305058.2023.2214647
Maroqi, N. (2018). Uji validitas konstruk pada instrumen Rosenberg Self-Esteem Scale dengan metode confirmatory factor analysis (CFA). Jurnal Pengukuran Psikologi dan Pendidikan Indonesia, 7(2), 92-96. https://doi.org/10.15408/jp3i.v7i2.12101
Meijer, R. R. (2003). Diagnosing item score patterns on a test using item response theory-based person-fit statistics. Psychological Methods, 8(1), 72–87. https://doi.org/10.1037/1082-989X.8.1.72
Meijer, R. R., & Sijtsma, K. (1995). Detection of aberrant item score patterns: A review of recent developments. Applied Measurement in Education, 8(3), 261–272. https://doi.org/10.1207/s15324818ame0803_5
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135. https://doi.org/10.1177/01466210122031957
Meijer, R. R., & Tendeiro, J. N. (2012). The use of the lz and lz* person-fit statistics and problems derived from model misspecification. Journal of Educational and Behavioral Statistics, 37(6), 758-766. https://doi.org/10.3102/1076998612466144
Meijer, R. R., Niessen, M. S. A., & Tendeiro, N. J. (2016). A practical guide to check the consistency of item response patterns in clinical research through person fit statistics: examples and a computer program. Assessment, 23(1), 56-62. https://doi.org/10.1177/1073191115577800
Moshagen, M., & Bader, M. (2024). semPower: General power analysis for structural equation models. Behavior Research Methods, 56(4), 2901-2922. https://doi.org/10.3758/s13428-023-02254-7
Ogihara, Y., & Kusumi, T. (2020). The developmental trajectory of self-esteem across the life span in Japan: Age differences in scores on the Rosenberg Self-Esteem Scale from adolescence to old age. Frontiers in Public Health, 8: 132. https://doi.org/10.3389/fpubh.2020.00132
Olson J. F., & Fremer J. (2013). TILSA Test security guidebook: Preventing, detecting, and investigating test security irregularities. Council of Chief State School Officers.
Panayides, P., & Tymms, P. (2012). Is aberrant response behavior a stable characteristic of students in classroom math tests? Rasch Measurement Transactions, 26(3), 1382-1383.
Pina, J. A. L., & Montesinos, M. D. H. (2005). Fitting Rasch model using appropriateness measure statistics. The Spanish Journal of Psychology, 8(1), 100-110. https://doi.org/10.1017/S113874160000500X
Reise, S. P., & Flannery, W. P. (1996). Assessing person-fit on measures of typical performance. Applied Measurement in Education, 9(1), 9–26. https://doi.org/10.1207/s15324818ame0901_3
Rolstad, S., Adler, J., & Ryden, A. (2011). Response burden and questionnaire length: Is shorter better? A review and meta-analysis. Value in Health, 14(8), 1101-1108. https://doi.org/10.1016/j.jval.2011.06.003
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02
Sijtsma, K., & Meijer, R. R. (2001). The person response function as a tool in person-fit research. Psychometrika, 66(2), 191–207. https://doi.org/10.1007/BF02294835
Spoden, C., Fleischer, J., & Frey, A. (2020). Person misfit, test anxiety, and test-taking motivation in a large-scale mathematics proficiency test for self-evaluation. Studies in Educational Evaluation, 67: 100910. https://doi.org/10.1016/j.stueduc.2020.100910
Tay, L., Meade, A. W., & Cao, M. (2015). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods, 18(1), 3-46. https://doi.org/10.1177/1094428114553062
Tesio, L., Caronni, A., Kumbhare, D., & Scarano, S. (2024a). Interpreting results from Rasch analysis 1. The “most likely” measures coming from the model. Disability and Rehabilitation, 46(3), 591–603. https://doi.org/10.1080/09638288.2023.2169771
Tesio, L., Caronni, A., Simone, A., Kumbhare, D., & Scarano, S. (2024b). Interpreting results from Rasch analysis 2. Advanced model applications and the data-model fit assessment. Disability and Rehabilitation, 46(3), 604–617. https://doi.org/10.1080/09638288.2023.2169772
Turner, K. T., & Engelhard, G., Jr. (2024). Using functional clustering to diagnose person misfit. Journal of Experimental Education, 92(2), 377–397. https://doi.org/10.1080/00220973.2022.2161088
van der Linden, W. J., & van Krimpen-Stoop, E. M. L. A. (2003). Using response times to detect aberrant responses in computerized adaptive testing. Psychometrika, 68(2), 251–265. https://doi.org/10.1007/BF02294800
Wanders, R. B. K., Meijer, R. R., Ruhé, H. G., Sytema, S., Wardenaar, K. J., & de Jonge, P. (2018). Person-fit feedback on inconsistent symptom reports in clinical depression care. Psychological Medicine, 48(11), 1844-1852. https://doi.org/10.1017/S003329171700335X
Wind, A. S., & Schumacker, E. R. (2017). Detecting measurement disturbances in rater mediated assessments. Educational Measurement: Issues and Practice, 36(4), 44-51. https://doi.org/10.1111/emip.12164
false discovery rate controlling multiple test procedures for correlated test statistics. Journal of Statistical Planning and Inference, 82(1-2), 171-196. https://doi.org/10.1016/S0378-3758(99)00041-5
Zahra, N. S., & Wirawan, H. (2024). Empowering digital transformation: Developing and validating a Digital Leadership Scale through Rasch model analysis. Measurement: Interdisciplinary Research and Perspectives. Advanced online publication. https://doi.org/10.1080/15366367.2024.2334591
Zou, D., & Bolt, D. M. (2023). Person misfit and person reliability in Rating Scale Measures: The role of response styles. Measurement: Interdisciplinary Research and Perspectives, 21(3), 167-180. https://doi.org/10.1080/15366367.2022.2114243