Have a personal or library account? Click to login
A paper mill detection model based on citation manipulation paradigm Cover

A paper mill detection model based on citation manipulation paradigm

Open Access
|Jan 2025

Abstract

Purpose

In this paper, we develop a heterogeneous graph network using citation relations between papers and their basic information centered around the “Paper mills” papers under withdrawal observation, and we train graph neural network models and classifiers on these heterogeneous graphs to classify paper nodes.

Design/methodology/approach

Our proposed citation network-based “Paper mills” detection model (PDCN model for short) integrates textual features extracted from the paper titles using the BERT model with structural features obtained from analyzing the heterogeneous graph through the heterogeneous graph attention network model. Subsequently, these features are classified using LGBM classifiers to identify “Paper mills” papers.

Findings

On our custom dataset, the PDCN model achieves an accuracy of 81.85% and an F1-score of 80.49% in the “Paper mills” detection task, representing a significant improvement in performance compared to several baseline models.

Research limitations

We considered only the title of the article as a text feature and did not obtain features for the entire article.

Practical implications

The PDCN model we developed can effectively identify “Paper mills” papers and is suitable for the automated detection of “Paper mills” during the review process.

Originality/value

We incorporated both text and citation detection into the “Paper mills” identification process. Additionally, the PDCN model offers a basis for judgment and scientific guidance in recognizing “Paper mills” papers.

DOI: https://doi.org/10.2478/jdis-2025-0003 | Journal eISSN: 2543-683X | Journal ISSN: 2096-157X
Language: English
Page range: 167 - 187
Submitted on: May 10, 2024
Accepted on: Nov 5, 2024
Published on: Jan 6, 2025
Published by: Chinese Academy of Sciences, National Science Library
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Jun Zhang, Jianhua Liu, Haihong E, Tianyi Hu, Xiaodong Qiao, ZiChen Tang, published by Chinese Academy of Sciences, National Science Library
This work is licensed under the Creative Commons Attribution 4.0 License.