Have a personal or library account? Click to login
2L-APD: A Two-Level Plagiarism Detection System for Arabic Documents Cover

2L-APD: A Two-Level Plagiarism Detection System for Arabic Documents

Open Access
|Mar 2018

References

  1. 1. Fishman, T. “We Know It When We See It” Is Not Good Enough: Toward a Standard Definition of Plagiarism That Transcends Theft, Fraud, and Copyright. 2009.
  2. 2. Gipp, B. Citation-Based Plagiarism Detection. – In: Citation-Based Plagiarism Detection. Springer, 2014, pp. 57-88.10.1007/978-3-658-06394-8_4
  3. 3. Guibert, P., C. Michaut. Le Plagiat Etudiant. – Education et Sociétés, 2011, No 2, pp. 149-163.10.3917/es.028.0149
  4. 4. McCabe, D. L. Cheating Among College and University Students: A North American Perspective. – International Journal for Educational Integrity, Vol. 1, 2005, No 1.10.21913/IJEI.v1i1.14
  5. 5. Bin-Habtoor, A. S., M. A. Zaher. A Survey on Plagiarism Detection Systems. – International Journal of Computer Theory and Engineering, Vol. 4, 2012, No 2, p. 185.10.7763/IJCTE.2012.V4.447
  6. 6. Menai, M. El B. Detection of Plagiarism in Arabic Documents. – International Journal of Information Technology and Computer Science (IJITCS), Vol. 4, 2012, No 10, p. 80.10.5815/ijitcs.2012.10.10
  7. 7. Farghaly, A., K. Shaalan. Arabic Natural Language Processing: Challenges and Solutions. – ACM Transactions on Asian Language Information Processing (TALIP), Vol. 8, 2009, No 4, p. 14.10.1145/1644879.1644881
  8. 8. Liu, C., C. Chen, J. Han, P. S. Yu. Gplag: Detection of Software Plagiarism by Program Dependence Graph Analysis. – In: Proc. of 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2006, pp. 872-881.10.1145/1150402.1150522
  9. 9. Alzahrani, S. M., N. Salim, A. Abraham. Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods. – IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 42, 2012, No 2, pp. 133-149.10.1109/TSMCC.2011.2134847
  10. 10. Potthast, M., M. Hagen, T. Gollub, M. Tippmann, J. Kiesel, P. Rosso, E. Stamatatos, B. Stein. Overview of the 5th International Competition on Plagiarism Detection. – In: CLEF Conference on Multilingual and Multimodal Information Access Evaluation, CELCT, 2013, pp. 301-331.
  11. 11. Stein, B., S. Meyer, Z. Eissen. Near Similarity Search and Plagiarism Analysis. – In: From Data and Information Analysis to Knowledge Engineering, Springer, 2006, pp. 430-437.10.1007/3-540-31314-1_52
  12. 12. Hoad, T. C., J. Zobel. Methods for Identifying Versioned and Plagiarized Documents. – Journal of the Association for Information Science and Technology, Vol. 54, 2003, No 3, pp. 203-215.10.1002/asi.10170
  13. 13. Udi, M. Finding Similar Files in a Large File System. – In: Proc. of USENIX, Winter 1994 Technical Conference, 1994.
  14. 14. Schleimer, S., D. S. Wilkerson, A. Aiken. Winnowing: Local Algorithms for Document Fingerprinting. – In: Proc. of 2003 ACM SIGMOD International Conference on Management of Data, ACM, 2003, pp. 76-85.10.1145/872757.872770
  15. 15. Karp, R. M., M. O. Rabin. Efficient Randomized Pattern-Matching Algorithms. – IBM Journal of Research and Development, Vol. 31, 1987, No 2, pp. 249-260.10.1147/rd.312.0249
  16. 16. Nagoudi, E. M. B., A. Khorsi, H. Cherroun. Efficient Inverted Index with n-Gram Sampling for String Matching in Arabic Documents. – In: 13th IEEE/ACS International Conference on Computer Systems and Applications, Agadir, Morocco, 2016, pp. 1-7.10.1109/AICCSA.2016.7945743
  17. 17. Lebert, M. Project Gutenberg (1971-2008). Project Gutenberg, 2008.
  18. 18. Ogawa, Y., T. Morita, K. Kobayashi. A Fuzzy Document Retrieval System Using the Keyword Connection Matrix and a Learning Method. – Fuzzy Sets and Systems, Vol. 39, 1991, No 2, pp. 163-179.10.1016/0165-0114(91)90210-H
  19. 19. Zahran, M. A., A. Magooda, A. Y. Mahgoub, H. Raafat, M. Rashwan, A. Atyia. Word Representations in Vector Space and their Applications for Arabic. – In: International Conference on Intelligent Text Processing and Computational Linguistics, Springer, 2015, pp. 430-443.10.1007/978-3-319-18111-0_32
  20. 20. Mikolov, T., K. Chen, G. Corrado, J. Dean. Efficient Estimation of Word Representations in Vector Space. – In: Proc. of International Conference on Learning Representations ICLR, Workshop Track, 2013, pp. 1301-3781.
  21. 21. Collobert, R., J. Weston. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. – In: Proc. of 25th International Conference on Machine Learning, ACM, 2008, pp. 160-167.10.1145/1390156.1390177
  22. 22. Mnih, A., G. E. Hinton. A Scalable Hierarchical Distributed Language Model. – In: Advances in Neural Information Processing Systems, 2009, pp. 1081-1088.
  23. 23. Turian, J., L. Ratinov, Y. Bengio. Word Representations: A Simple and General Method for Semi-Supervised Learning. – In: Proc. of 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2010, pp. 384-394.
  24. 24. Mikolov, T., Wen-Tau Yih, G. Zweig. Linguistic Regularities in Continuous Space Word Representations. – In: Hlt-naacl, Vol. 13, 2013, pp. 746-751.
  25. 25. Mikolov, T., I. Sutskever, K. Chen, G. S. Corrado, J. Dean. Distributed Representations of Words and Phrases and Their Compositionality. – In: Advances in Neural Information Processing Systems, 2013, pp. 3111-3119.
  26. 26. Pennington, J., R. Socher, C. D. Manning. Glove: Global Vectors for Word Representation. – In: EMNLP, Vol. 14, 2014, pp. 1532-1543.10.3115/v1/D14-1162
  27. 27. Maurer, H. A. F. Kappe, B. Zaka. Plagiarism a Survey. – J. UCS, Vol. 12, 2006, No 8, pp. 1050-1084.
  28. 28. Alzahrani, S. M., N. Salim. Plagiarism Detection in Arabic Scripts Using Fuzzy Information Retrieval. – In: Student Conf., Johor Bahru, Malaysia, 2008, pp. 281-285.
  29. 29. Potthast, M., M. Hagen, T. Gollub, M. Tippmann, J. Kiesel, P. Rosso, E. Stamatatos, B. Stein. Overview of the 5th International Competition on Plagiarism Detection. – In: CLEF Conference on Multilingual and Multimodal Information Access Evaluation, CELCT, 2013, pp. 301-331.
  30. 30. Alzahrani, S., N. Salim. Fuzzy Semantic-Based String Similarity for Extrinsic Plagiarism Detection: Lab Report for PAN at CLEF’10. – In: Proc. of 4th Int. Workshop PAN-10, Padua, Italy, 2010.
  31. 31. Black, W., S. Elkateb, H. Rodriguez, M. Alkhalifa, P. Vossen, A. Pease, C. Fellbaum. Introducing the Arabic Wordnet Project. – In: Proc. of 3rd International Word-Net Conference, 2006, pp. 295-300.
  32. 32. Jadalla, A., A. Elnagar. A Plagiarism Detection System for Arabic Text-Based Documents. – In: Pacific-Asia Workshop on Intelligence and Security Informatics, Springer, 2012, pp. 145-153.10.1007/978-3-642-30428-6_12
  33. 33. Hussein, A. S. A Plagiarism Detection System for Arabic Documents. – In: Intelligent Systems, 2014. Springer International Publishing, 2015, pp. 541-552.10.1007/978-3-319-11310-4_47
  34. 34. Bensalem, I., I. Boukhalfa, P. Rosso, L. Abouenour, K. Darwish, S. Chikhi. Overview of the Araplagdet pan@Fire2015 Shared Task on Arabic Plagiarism Detection. – In: FIRE Workshops, 2015, pp. 111-122.
  35. 35. Magooda, A., A. Y. Mahgoub, M. Rashwan, M. B. Fayek, H. M. Raafat. Rdi System for Extrinsic Plagiarism Detection (Rdi Red), Working Notes for Panaraplagdet at Fire 2015. – In: FIRE Workshops, 2015, pp. 126-128.
  36. 36. Alzahrani, S. Arabic Plagiarism Detection Using Word Correlation in n-Grams with k-Overlapping Approach, Working Notes for Panaraplagdet at Fire 2015. – In: FIRE Workshops, 2015, pp. 123-125.
  37. 37. Pasha, A., M. Al-Badrashiny, M. T. Diab, A. El Kholy, R. Eskander, N. Habash, M. Pooleery, O. Rambow, R. Roth. Madamira: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic. – In: LREC, Vol. 14, 2014, pp. 1094-1101.
  38. 38. McDonald, R., K. Lerman, F. Pereira. Multilingual Dependency Analysis with a Two-Stage Discriminative Parser. – In: Proc. of 10th Conference on Computational Natural Language Learning, Association for Computational Linguistics, 2006, pp. 216-220.10.3115/1596276.1596317
  39. 39. Ritchie, D. M., B. W. Kernighan, M. E. Lesk. The C Programming Language. – Prentice Hall Englewood Cliffs, 1988, ISBN:0131103709.
  40. 40. Nagoudi, E. M. B., D. Schwab. Semantic Similarity of Arabic Sentences with Word Embeddings. – In: Proc. of Third Arabic Natural Language Processing Workshop (WANLP), Association for Computational Linguistics, 2017, pp. 18-24.10.18653/v1/W17-1303
  41. 41. Sultan, M. A., S. Bethard, T. Sumner. Dls@cu: Sentence Similarity from Word Alignment and Semantic Vector Composition. – In: Proc. of 9th International Workshop on Semantic Evaluation, 2015, pp. 148-153.10.18653/v1/S15-2027
  42. 42. Gahbiche-Braham, S., H. Bonneau-Maynard, T. Lavergne, F. Yvon. Joint Segmentation and Pos Tagging for Arabic Using a crf-Based Classifier. – In: LREC, 2012, pp. 2107-2113.
  43. 43. Potthast, M., B. Stein, A. Barrón-Cedeño, P. Rosso. An Evaluation Framework for Plagiarism Detection. – In: Proc. of 23rd International Conference on Computational Linguistics: Association for Computational Linguistics, 2010, pp. 997-1005.
  44. 44. McCandless, M., E. Hatcher, O. Gospodnetic. Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co., 2010.
  45. 45. Salton, G. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Boston, Addison-Wesley, 1989.
  46. 46. Simmons, S., Z. Estes. Using Latent Semantic Analysis to Estimate Similarity. – In: Proc. of Cognitive Science Society, 2006, pp. 2169-2173.
  47. 47. Ceska, Z. Plagiarism Detection Based on Singular Value Decomposition. – In: Advances in Natural language Processing. Springer, 2008, pp. 108-119.10.1007/978-3-540-85287-2_11
DOI: https://doi.org/10.2478/cait-2018-0011 | Journal eISSN: 1314-4081 | Journal ISSN: 1311-9702
Language: English
Page range: 124 - 138
Submitted on: Jul 2, 2017
Accepted on: Nov 25, 2017
Published on: Mar 30, 2018
Published by: Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2018 El Moatez Billah Nagoudi, Ahmed Khorsi, Hadda Cherroun, Didier Schwab, published by Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.