Have a personal or library account? Click to login
Evaluating research quality with Large Language Models: An analysis of ChatGPT’s effectiveness with different settings and inputs Cover

Evaluating research quality with Large Language Models: An analysis of ChatGPT’s effectiveness with different settings and inputs

By: Mike Thelwall  
Open Access
|Feb 2025

References

  1. Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., & Fung, P. (2023). A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
  2. Barrere, R. (2020). Indicators for the assessment of excellence in developing countries. In: Kraemer-Mbula, E., Tijssen, R., Wallace, M. L., & McClean, R. (Eds.), Transforming research excellence: New ideas from the Global South. Cape Town, South Africa: African Minds (pp. 219-232).
  3. Bornmann, L., & Lepori, B. (2024). The use of ChatGPT to find similar institutions for institutional benchmarking. Scientometrics, 1-6.
  4. Buckle, R. A., and Creedy, J. (2024). The performance based research fund in New Zealand: Taking stock and looking forward. New Zealand Economic Papers, 58(2), 107-125. https://doi.org/10.1080/00779954.2022.2156382
  5. Buscemi, A., & Proverbio, D. (2024). Chatgpt vs Gemini vs Llama on multilingual sentiment analysis. arXiv preprint arXiv:2402.01715.
  6. Carbonell Cortés, C. (2024). AI-assisted pre-screening of biomedical research proposals: ethical considerations and the pilot case of “la Caixa” Foundation. https://www.youtube.com/watch?v=O2DcXzEtCmg
  7. de Winter, J. (2024). Can ChatGPT be used to predict citation counts, readership, and social media interaction? An exploration among 2222 scientific abstracts. Scientometrics, 1-19.
  8. Elsevier (2024). Publishing ethics. https://www.elsevier.com/en-gb/about/policies-and-standards/publishing-ethics (20 July 2024)
  9. Hsieh, C. P., Sun, S., Kriman, S., Acharya, S., Rekesh, D., Jia, F., & Ginsburg, B. (2024). RULER: What’s the Real Context Size of Your Long-Context Language Models? arXiv preprint arXiv:2404.06654.
  10. Kang, D., Ammar, W., Dalvi, B., Van Zuylen, M., Kohlmeier, S., Hovy, E., & Schwartz, R. (2018). A dataset of peer reviews (PeerRead): Collection, insights and NLP applications. arXiv preprint arXiv:1804.09635.
  11. Kocoń, J., Cichecki, I., Kaszyca, O., Kochanek, M., Szydło, D., Baran, J., & Kazienko, P. (2023). ChatGPT: Jack of all trades, master of none. Information Fusion, 101861.
  12. Kousha, K., & Thelwall, M. (2024). Factors associating with or predicting more cited or higher quality journal articles: An Annual Review of Information Science and Technology (ARIST) paper. Journal of the Association for Information Science and Technology, 75(3), 215-244.
  13. Langfeldt, L., Nedeva, M., Sörlin, S., & Thomas, D. A. (2020). Co-existing notions of research quality: A framework to study context-specific understandings of good research. Minerva, 58(1), 115-137.
  14. Liang, W., Izzo, Z., Zhang, Y., Lepp, H., Cao, H., Zhao, X., & Zou, J. Y. (2024a). Monitoring ai-modified content at scale: A case study on the impact of ChatGPT on AI conference peer reviews. arXiv preprint arXiv:2403.07183.
  15. Liang, W., Zhang, Y., Cao, H., Wang, B., Ding, D. Y., Yang, X., & Zou, J. (2024b). Can large language models provide useful feedback on research papers? A large-scale empirical analysis. NEJM AI, AIoa2400196. https://doi.org/10.1056/AIoa2400196
  16. Lu, Y., Xu, S., Zhang, Y., Kong, Y., & Schoenebeck, G. (2024). Eliciting Informative Text Evaluations with Large Language Models. arXiv preprint arXiv:2405.15077.
  17. MacRoberts, M. H., & MacRoberts, B. R. (2018). The mismeasure of science: Citation analysis. Journal of the Association for Information Science and Technology, 69(3), 474-482.
  18. Moed, H. F. (2006). Citation analysis in research evaluation. Berlin, Germany: Springer.
  19. OpenAI (2024). Key concepts. https://platform.openai.com/docs/concepts (21 July 2023).
  20. Pontika, N., Klebel, T., Correia, A., Metzler, H., Knoth, P., & Ross-Hellauer, T. (2022). Indicators of research quality, quantity, openness, and responsibility in institutional review, promotion, and tenure policies across seven countries. Quantitative Science Studies, 3(4), 888-911.
  21. Qiu, J., & Han, X. (2024). An Early Evaluation of the Long-Term Influence of Academic Papers Based on Machine Learning Algorithms. IEEE Access, 12, 41773-41786.
  22. REF (2019). Panel criteria and working methods (2019/02). https://2021.ref.ac.uk/publications-and-reports/panel-criteria-and-working-methods-201902/index.html
  23. Sivertsen, G. (2017). Unique, but still best practice? The Research Excellence Framework (REF) from an international perspective. Palgrave Communications, 3(1), 1-6.
  24. Thelwall, M. (2024). Can ChatGPT evaluate research quality? Journal of Data and Information Science, 9(2), 1-21. https://doi.org/10.2478/jdis-2024-0013
  25. Thelwall, M., Kousha, K., Wilson, P. Makita, M., Abdoli, M., Stuart, E., Levitt, J., Knoth, P., & Cancellieri, M. (2023). Predicting article quality scores with machine learning: The UK Research Excellence Framework. Quantitative Science Studies, 4(2), 547-573. https://doi.org/10.1162/qss_a_00258
  26. Tierney, W. G., & Bensimon, E. M. (1996). Promotion and tenure: Community and socialization in academe. New York, NY: Suny Press.
  27. Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., & Hu, X. (2024). Harnessing the power of llms in practice: A survey on ChatGPT and beyond. ACM Transactions on Knowledge Discovery from Data, 18(6), 1-32.
  28. Zhang, Z., Zhang, A., Li, M., & Smola, A. (2022). Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.
  29. Zhou, R., Chen, L., & Yu, K. (2024). Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 9340-9351).
DOI: https://doi.org/10.2478/jdis-2025-0011 | Journal eISSN: 2543-683X | Journal ISSN: 2096-157X
Language: English
Page range: 7 - 25
Submitted on: Aug 22, 2024
Accepted on: Dec 11, 2024
Published on: Feb 18, 2025
Published by: Chinese Academy of Sciences, National Science Library
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Mike Thelwall, published by Chinese Academy of Sciences, National Science Library
This work is licensed under the Creative Commons Attribution 4.0 License.