Evaluating research quality with Large Language Models: An analysis of ChatGPT’s effectiveness with different settings and inputs Cover

.blurhash-client-img { display: none !important; }

Evaluating research quality with Large Language Models: An analysis of ChatGPT’s effectiveness with different settings and inputs

Journal of Data and Information Science

Volume 10 (2025): Issue 1 (February 2025)

By: Mike Thelwall

Open Access

|Feb 2025

Figures & Tables

ChatGPT 3.5-turbo score prediction correlations against human scores for 51 information science article full texts (truncated), article titles and abstracts, or just titles. Averages over n iterations and confidence intervals are calculated as in the methods.

ChatGPT 4o-mini, ChatGPT 3.5-turbo and ChatGPT 4o score prediction correlations against human scores for 51 information science article titles and abstracts. Averages over n iterations and confidence intervals are calculated as in the methods.

ChatGPT 4o score predictions based on abstracts (average of 30) against human scores (from the author) for 51 information science article titles and abstracts.

ChatGPT 4o score predictions based on abstracts (average of 30) against human scores (from the author) for 51 information science article titles and abstracts with seven different system prompts. Strategies 1-5 are abbreviations of Strategy 6, the full REF instructions, and Strategy 0 is a brief instruction without a request for justification.

ChatGPT 4 (web interface) score prediction correlations against human scores for 51 information science article titles and abstracts. Averages over n iterations and confidence intervals are calculated as in the methods (data from: Thelwall, 2024).

Mean Average Deviations for direct predictions and predictions with linear regression for each model and input_ The improve column gives the percentage reduction in MAD compared to the baseline strategy of assigning each article the overall human average, 2_75_

Model and input	Direct		Regression
Model and input	MAD	Improve	Intercept	Coefficient	MAD	Improve
GPT-3.5 turbo: Titles	0.68	6%	-1.16	1.57	0.63	13%
GPT-3.5 turbo: Abstracts	0.60	17%	-3.46	2.26	0.51	30%
GPT-3.5 turbo: Truncated text	0.70	4%	-7.49	3.38	0.55	24%
GPT-4o-mini: Abstracts	0.63	13%	-3.32	2.07	0.59	19%
GPT-4o-mini: Truncated text	0.75	-3%	-2.44	1.61	0.60	17%
GPT-4o: Abstracts	0.62	14%	-3.40	2.05	0.50	31%
GPT-4o: Truncated text	0.69	5%	-4.44	2.28	0.50	31%

Spearman correlations between humans scores and model average scores (over 30 iterations) for 51 information science articles_ Values above 0_75 are highlighted_

Spearman correlation	GPT-3.5 turbo: Abstracts	GPT-3.5 turbo: Truncated text	GPT-4o-mini: Abstracts	GPT-4o-mini: Truncated text	GPT-4o:Abstracts	GPT-4o: Truncated text	Human
GPT-3.5 turbo: Titles	0.439	0.444	0.359	0.499	0.539	0.589	0.434
GPT-3.5 turbo: Abstracts	1.000	0.757	0.700	0.718	0.875	0.774	0.674
GPT-3.5 turbo: Truncated text		1.000	0.672	0.686	0.732	0.783	0.625
GPT-4o-mini: Abstracts			1.000	0.608	0.729	0.653	0.571
GPT-4o-mini: Truncated text				1.000	0.813	0.801	0.506
GPT-4o: Abstracts					1.000	0.858	0.678
GPT-4o: Truncated text						1.000	0.675

Average humans scores and model average scores_

	Human	GPT-3.5 turbo: Titles	GPT-3.5 turbo: Abstracts	GPT-3.5 turbo: Truncated text	GPT-4o-mini: Abstracts	GPT-4o-mini: Truncated text	GPT-4o: Abstracts	GPT-4o: Truncated text
Mean score	2.75	2.49	2.75	3.03	2.93	3.22	2.99	3.16

DOI: https://doi.org/10.2478/jdis-2025-0011 | Journal eISSN: 2543-683X | Journal ISSN: 2096-157X

Journal RSS Feed

Language: English

Page range: 7 - 25

Submitted on: Aug 22, 2024

Accepted on: Dec 11, 2024

Published on: Feb 18, 2025

Published by: Chinese Academy of Sciences, National Science Library

In partnership with: Paradigm Publishing Services

Publication frequency: 4 issues per year

Keywords:

Large Language Models,

Scientometrics,

Research Assessment

Related subjects:

Computer sciences,

Information technology,

Project management,

Databases and data mining

© 2025 Mike Thelwall, published by Chinese Academy of Sciences, National Science Library
This work is licensed under the Creative Commons Attribution 4.0 License.

Previous article Volume 10 (2025): Issue 1 (February 2025)Next article