Have a personal or library account? Click to login
Judging Students' Texts in a Digital Research Tool: Do Text Quality, Students' Gender, and Migration Background Impact Teachers' Text Assessments? Cover

Judging Students' Texts in a Digital Research Tool: Do Text Quality, Students' Gender, and Migration Background Impact Teachers' Text Assessments?

Open Access
|Dec 2025

References

  1. Barkaoui, K. (2010). Explaining ESL essay holistic scores: A multilevel modeling approach. Language Testing, 27(4), 515–535. https://doi.org/10.1177/0265532210368717
  2. Brunswik, E. (1955). Representative design and probabilistic theory in a functional psychology. Psychological Review, 62(3), 193–217. https://doi.org/10.1037/h0047470
  3. Bonefeld, M., Kleen, H., & Glock, S. (2022). The effect of the interplay of gender and ethnicity on teachers judgements: Does the school subject matter? The Journal of Experimental Education, 90(4), 818–838. https://doi.org/10.1080/00220973.2021.1878991
  4. Canz, T., Hoffmann, L., & Kania, R. (2020). Presentation-mode effects in large-scale writing assessments. Assessing Writing, 45, 100470. https://doi.org/10.1016/j.asw.2020.100470
  5. Clahsen, H., & Felser, C. (2006). How native-like is non-native language processing? Trends in Cognitive Sciences, 10(12), 564–570. https://doi.org/10.1016/j.tics.2006.10.002
  6. Cooksey, R. W., Freebody, P., & Wyatt-Smith, C. (2007). Assessment as judgment-in-context: Analysing how teachers evaluate students' writing. Educational Research and Evaluation, 13(5), 401–434. https://doi.org/10.1080/13803610701728311
  7. Copur-Gencturk, Y., Thacker, I., & Cimpian, J. R. (2022). Teacher bias in the virtual classroom. Computers & Education, 191, 104627. https://doi.org/10.1016/j.compedu.2022.104627
  8. Copur-Gencturk, Y., Thacker, I., & Cimpian, J. R. (2023). Teachers' race and gender biases and the moderating effects of their beliefs and dispositions. International Journal of STEM Education, 10(1), 1–25. https://doi.org/10.1186/s40594-023-00420-z
  9. Crusan, D. (2010). Assessment in the Second Language Writing Classroom. University of Michigan Press/ELT. https://doi.org/10.3998/mpub.770334
  10. Doornkamp, L., van der Pol, L. D., Groeneveld, S., Mesman, J., Endendijk, J. J., & Groeneveld, M. G. (2022). Understanding gender bias in teachers' grading: The role of gender stereotypical beliefs. Teaching and Teacher Education, 118, 103826. https://doi.org/10.1016/j.tate.2022.103826
  11. Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155–185. https://doi.org/10.1177/0265532207086780
  12. Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/BF03193146
  13. Feenstra, H. (2014). Assessing writing ability in primary education: On the evaluation of text quality and text complexity [Dissertation, University of Twente]. Research information. https://doi.org/10.3990/1.9789036537254
  14. Fischer, J., Jansen, T., Möller, J., & Harms, U. (2021). Measuring biology trainee teachers' professional knowledge about evolution—Introducing the Student Inventory. Evolution: Education and Outreach, 14(1), 1–16. https://doi.org/10.1186/s12052-021-00144-0
  15. Fiske, S. T., & Neuberg, S. L. (1990). A continuum of impression formation, from category-based to individuating processes: Influences of information and motivation on attention and interpretation. Advances in Experimental Social Psychology, 23, 1–74. https://doi.org/10.1016/S0065-2601(08)60317-2
  16. Fleckenstein, J., Meyer, J., Jansen, T., Keller, S., & Köller, O. (2020). Is a long essay always a good essay? The effect of text length on writing assessment. Frontiers in Psychology, 11, 562462. https://doi.org/10.3389/fpsyg.2020.562462
  17. Gebhardt, M., Rauch, D., Mang, J., Sälzer, C., & Stanat, P. (2013). Mathematische Kompetenz von Schülerinnen und Schülern mit Zuwanderungshintergrund [Mathematical competence of students with a migrant background]. In M. Prenzel, C. Sälzer, E. Klieme, & O. Köller (Eds.), PISA 2012. Fortschritte und Herausforderungen in Deutschland (pp. 275–308). Münster, New York, München, Berlin: Waxmann.
  18. Gentrup, S., Olczyk, M., & Lorenz, G. (2024). Teacher stereotypes and teacher expectations at the intersection of student gender and socioeconomic status. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 56(1–2), 87–102. https://doi.org/10.1026/0049-8637/a000291
  19. Graham, S. (2019). Changing How Writing Is Taught. Review of Research in Education, 43(1), 277–303. https://doi.org/10.3102/0091732X18821125
  20. Graham, S., Hebert, M., & Harris, K. R. (2015). Formative assessment and writing: A meta analysis. The Elementary School Journal, 115(4), 523–547. https://doi.org/10.1086/681947
  21. Hartig, J., & Jude, N. (2008). Sprachkompetenzen von Mädchen und Jungen [Language competencies of girls and boys]. In E. Klieme (Ed.), Beltz Pädagogik, Unterricht und Kompetenzerwerb in Deutsch und Englisch: Ergebnisse der DESI-Studie (pp. 202–207). Weinheim:Beltz. https://doi.org/10.25656/01:3154
  22. Herppich, S., Praetorius, A.-K., Förster, N., Karst, K., Leutner, D., Behrmann, L. et al. (2018). Teachers' assessment competence. Integrating knowledge-, process-, and product-oriented approaches into a competence-oriented conceptual model. Teaching and Teacher Education, 76, 181–193. https://doi.org/10.1016/j.tate.2017.12.001
  23. Holder, K., & Kessels, U. (2017). Gender and ethnic stereotypes in student teachers' judgments: A new look from a shifting standards perspective. Social Psychology of Education, 20(3), 471–490. https://doi.org/10.1007/s11218-017-9384-z
  24. Jansen, T., Meyer, J., Schipolowski, S., & Möller, J. (2024). Feedback on teachers' text assessment: Does it foster assessment accuracy and motivation? Zeitschrift für Pädagogische Psychologie, 38(1–2), 35–47. https://doi.org/10.1024/1010-0652/a000365
  25. Jansen, T., Vögelin, C., Machts, N., Keller, S., Köller, O., & Möller, J. (2021). Judgment accuracy in experienced versus student teachers: Assessing essays in English as a foreign language. Teaching and Teacher Education, 97, 103216. https://doi.org/10.1016/j.tate.2020.103216
  26. Jansen, T., Vögelin, C., Machts, N., Keller, S., & Möller, J. (2019). Empirische Arbeit: Das Schülerinventar ASSET zur Beurteilung von Schülerarbeiten im Fach Englisch [Empirical work: The ASSET student inventory for assessing student work in English]. Psychologie in Erziehung und Unterricht, 66(4), 303–315. https://doi.org/10.2378/peu2019.art21d
  27. Jansen, T., Vögelin, C., Machts, N., Keller, S., & Möller, J. (2021). Don't just judge the spelling! The influence of spelling on assessing second-language student essays. Frontline Learning Research, 9(1), 44–65. https://eric.ed.gov/?id=ej1284840
  28. Kaiser, J., Möller, J., Helm, F., & Kunter, M. (2015). Das Schülerinventar: Welche Schülermerkmale die Leistungsurteile von Lehrkräften beeinflussen [The student inventory: Which student characteristics influence teachers' performance assessments]. Zeitschrift für Erziehungswissenschaft, 18(2), 279–302. https://doi.org/10.1007/s11618-015-0619-5
  29. Kaiser, J., Südkamp, A., & Möller, J. (2017). The effects of student characteristics on teachers' judgment accuracy: Disentangling ethnicity, minority status, and achievement. Journal of Educational Psychology, 109(6), 871–888. https://doi.org/10.1037/edu0000156
  30. Karing, C. (2009). Diagnostische Kompetenz von Grundschul- und Gymnasiallehrkräften im Leistungsbereich und im Bereich Interessen [Diagnostic competence of primary and secondary school teachers in the areas of performance and interests]. Zeitschrift für Pädagogische Psychologie, 23(34), 197–209. https://doi.org/10.1024/1010-0652.23.34.197
  31. Karing, C., Rausch, T., & Artelt, C. (2024). Teacher judgement accuracy—measurements, causes and effects. In S. Weinert, H.-G. Roßbach, J. von Maurice, H.-P. Blossfeld, & C. Artelt (Eds.), Edition ZfE: Band 16. Educational processes, decisions, and the development of competencies from early preschool age to adolescence: Findings from the BiKS Cohort Panel Studies (pp. 263–280). Springer VS. https://doi.org/10.1007/978-3-658-43414-4_10
  32. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159. https://doi.org/10.2307/2529310
  33. Loibl, K., Leuders, T., & Dörfler, T. (2020). A framework for explaining teachers' diagnostic judgements by cognitive modeling (DiaCoM). Teaching and Teacher Education, 91, 103059. https://doi.org/10.1016/j.tate.2020.103059
  34. Machts, N., Chernikova, O., Jansen, T., Weidenbusch, M., Fischer, F., & Möller, J. (2024). Categorization of simulated diagnostic situations and the salience of diagnostic information: Conceptual framework. Zeitschrift für Pädagogische Psychologie, 38(1–2), 3–13. https://doi.org/10.1024/1010-0652/a000364
  35. Möller, J., Jansen, T., Fleckenstein, J., Machts, N., Meyer, J., & Reble, R. (2022). Judgment accuracy of German student texts: Do teacher experience and content knowledge matter? Teaching and Teacher Education, 119, 103879. https://doi.org/10.1016/j.tate.2022.103879
  36. National Assessment Governing Board (2011a). Developing achievement levels on the national assessment of educational progress for writing grades 8 and 12 in 2011 and grade 4 in 2013. NAEP Writing ALS Design Document.
  37. National Assessment Governing Board (2011b). Writing framework for the 2011 national assessment of educational progress. U.S. Department of Education, Washington, D.C.
  38. National Center for Education Statistics (2012). The nation's report card: Writing 2011. Institute of Education Sciences, U.S. Department of Education, Washington, D.C.
  39. Petersen, J. (2018). Gender difference in verbal performance: A meta-analysis of United States State performance assessments. Educational Psychology Review, 30(4), 1269–1281. https://doi.org/10.1007/s10648-018-9450-x
  40. Rauin, U. & Meier, U. (2007). Subjektive Einschätzungen des Kompetenzerwerbs in der Lehramtsausbildung [Subjective assessments of skills acquisition in teacher training]. In M. Lüders & J. Wissinger (Eds.), Forschung zur Lehrerbildung, Kompetenzentwicklung und Programmevaluation (pp. 102–131) Münster: Waxmann.
  41. Ready, D. D., & Wright, D. L. (2011). Accuracy and inaccuracy in teachers' perceptions of young children's cognitive abilities. American Educational Research Journal, 48(2), 335–360. https://doi.org/10.3102/0002831210374874
  42. Reilly, D., Neumann, D. L., & Andrews, G. (2019). Gender differences in reading and writing achievement: Evidence from the National Assessment of Educational Progress (NAEP). The American Psychologist, 74(4), 445–458. https://doi.org/10.1037/amp0000356
  43. Retelsdorf, J., Schwartz, K., & Asbrock, F. (2015). “Michael can't read!” Teachers' gender stereotypes and boys' reading self-concept. Journal of Educational Psychology, 107(1), 186–194. https://doi.org/10.1037/a0037107
  44. Rudolph, U., Böhm, R., & Lummer, M. (2007). Ein Vorname sagt mehr als 1000 Worte [A first name says more than a thousand words]. Zeitschrift Für Sozialpsychologie, 38(1), 17–31. https://doi.org/10.1024/0044-3514.38.1.17
  45. Schipolowski, S., & Böhme, K. (2016). Assessment of writing ability in secondary education: comparison of analytic and holistic scoring systems for use in large-scale assessments. L1-Educational Studies in Language and Literature, 16(1), 1–22. https://doi.org/10.17239/L1ESLL-2016.16.01.03
  46. Skar, G. B., & Jølle, L. J. (2017). Teachers as raters: Investigation of a long-term writing assessment program. L1 Educational Studies in Language and Literature, 17, 1–30. https://doi.org/10.17239/L1ESLL-2017.17.01.06
  47. Stang, J., & Urhahne, D. (2016). Stabilität, Bezugsnormorientierung und Auswirkungen der Urteilsgenauigkeit [Stability, reference norm orientation and effects of judgment accuracy]. Zeitschrift für Pädagogische Psychologie, 30(4), 251–262. https://doi.org/10.1024/1010-0652/a000190
  48. Strahl, F., Jansen, T., Kilian, J., Reble, R., Schneider, R., & Möller, J. (2025). Context counts: Unveiling the impact of achievement level on teachers' text assessment. Learning and Instruction, 95, 102046. https://doi.org/10.1016/j.learninstruc.2024.102046
  49. Südkamp, A., Kaiser, J., & Möller, J. (2012). Accuracy of teachers' judgments of students' academic achievement: A meta-analysis. Journal of Educational Psychology, 104(3), 743–762. https://doi.org/10.1037/a0027627
  50. Ullman. (2020). The declarative/procedural model: A neurobiologically motivated theory of first and second language 1. In B. VanPatten, G. D. Keating & S. Wulff (Eds.), Theories in Second Language Acquisition (3rd ed., pp. 128–161). New York: Routledge. https://doi.org/10.4324/9780429503986-7
  51. Urhahne, D., & Wijnia, L. (2021). A review on the accuracy of teacher judgments. Educational Research Review, 32, 100374. https://doi.org/10.1016/j.edurev.2020.100374
  52. van Ewijk, R. (2011). Same work, lower grade? Student ethnicity and teachers' subjective assessments. Economics of Education Review, 30(5), 1045–1058. https://doi.org/10.1016/j.econedurev.2011.05.008
  53. Vögelin, C., Jansen, T., Keller, S. D., Machts, N., & Möller, J. (2019). The influence of lexical features on teacher judgements of ESL argumentative essays. Assessing Writing, 39, 50–63. https://doi.org/10.1016/j.asw.2018.12.003
  54. Vögelin, C., Jansen, T., Keller, S. D., & Möller, J. (2018). The impact of vocabulary and spelling on judgments of ESL essays: an analysis of teacher comments. The Language Learning Journal, 49(6), 631–647. https://doi.org/10.1080/09571736.2018.1522662
  55. Weigle, S. C. (2007). Teaching writing teachers about assessment. Journal of Second Language Writing, 16(3), 194–209. https://doi.org/10.1016/j.jslw.2007.07.004
Language: English
Page range: 47 - 61
Published on: Dec 31, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Frederike Strahl, Jörg Kilian, Jens Möller, published by Gesellschaft für Fachdidaktik (GfD e.V.)
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 License.