Have a personal or library account? Click to login
Evaluation of Fingerprint Selection Algorithms for Two-Stage Plagiarism Detection Cover

Evaluation of Fingerprint Selection Algorithms for Two-Stage Plagiarism Detection

Open Access
|Dec 2021

Abstract

Generally, the process of plagiarism detection can be divided into two main stages: source retrieval and text alignment. The paper evaluates and compares effectiveness of five fingerprint selection algorithms used during the source retrieval stage: Every p-th, 0 mod p, Winnowing, Frequency-biased Winnowing (FBW) and Modified FBW (MFBW). The algorithms are evaluated on a dataset containing plagiarism cases in Bachelor and Master Theses written in English in the field of computer science. The best performance is reached by 0 mod p, Winnowing and MFBW. For these algorithms, reduction of fingerprint size from 100 % to about 20 % kept the effectiveness at approximately the same level. Moreover, MFBW sends overall fewer document pairs to the text alignment stage, thus also reducing the computational cost of the process. The software developed for this study is freely available at the author’s website http://www.cs.rtu.lv/jekabsons/.

DOI: https://doi.org/10.2478/acss-2021-0022 | Journal eISSN: 2255-8691 | Journal ISSN: 2255-8683
Language: English
Page range: 178 - 182
Published on: Dec 30, 2021
Published by: Riga Technical University
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Gints Jēkabsons, published by Riga Technical University
This work is licensed under the Creative Commons Attribution 4.0 License.