Have a personal or library account? Click to login
The Optimization of n-Gram Feature Extraction Based on Term Occurrence for Cyberbullying Classification Cover

The Optimization of n-Gram Feature Extraction Based on Term Occurrence for Cyberbullying Classification

Open Access
|May 2024

References

  1. 1Aman, M, Md Said, A B, Abdul Kadir, S J and Ullah, I 2018 Key concept identification: A sentence parse tree-based technique for candidate feature extraction from unstructured texts. IEEE Access, 6: 6040360413. DOI: 10.1109/ACCESS.2018.2875135
  2. 2Arroyo-Fernández, I, Méndez-Cruz, C F, Sierra, G, Torres-Moreno, J M and Sidorov, G 2019 Unsupervised sentence representations as word information series: Revisiting TF–IDF. Computer Speech & Language, 56: 107129. DOI: 10.1016/j.csl.2019.01.005
  3. 3Balaji, T K, Annavarapu, C S R and Bablani, A 2021 Machine learning algorithms for social media analysis: A survey. Computer Science Review, 40, 100395. DOI: 10.1016/j.cosrev.2021.100395
  4. 4Beran, T and Li, Q 2007 The relationship between cyberbullying and school bullying. Journal of Student Wellbeing, 1(2): 1533. DOI: 10.21913/JSW.v1i2.172
  5. 5Chan, T K H, Cheung, C M K and Lee, Z W Y 2021 Cyberbullying on social networking sites: A literature review and future research directions. Information & Management, 58(2): 103411. DOI: 10.1016/j.im.2020.103411
  6. 6Chen, H, Tino, P and Yao, X 2009 Probabilistic classification vector machines. IEEE Transactions on Neural Networks, 20(6): 901914. DOI: 10.1109/TNN.2009.2014161
  7. 7Chen, K, Zhang, Z, Long, J and Zhang, H 2016 Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Systems with Applications, 66: 245260. DOI: 10.1016/j.eswa.2016.09.009
  8. 8Chia, Z L, Ptaszynski, M, Masui, F, Leliwa, G and Wroczynski, M 2021a Machine learning and feature engineering-based study into sarcasm and irony classification with application to cyberbullying detection. Information Processing & Management, 58(4): 102600. DOI: 10.1016/j.ipm.2021.102600
  9. 9Chia, Z L, Ptaszynski, M, Masui, F, Leliwa, G and Wroczynski, M 2021b Machine learning and feature engineering-based study into sarcasm and irony classification with application to cyberbullying detection. Information Processing & Management, 58(4): 102600. DOI: 10.1016/j.ipm.2021.102600
  10. 10Fayek, H M, Cavedon, L and Wu, H R 2020 Progressive learning: A deep learning framework for continual learning. Neural Networks, 128: 345357. DOI: 10.1016/j.neunet.2020.05.011
  11. 11Fiok, K, Karwowski, W, Gutierrez, E and Wilamowski, M 2021 Analysis of sentiment in tweets addressed to a single domain-specific Twitter account: Comparison of model performance and explainability of predictions. Expert Systems with Applications, 186: 115771. DOI: 10.1016/j.eswa.2021.115771
  12. 12Garousi, V, Bauer, S and Felderer, M 2020a NLP-assisted software testing: A systematic mapping of the literature. Information and Software Technology, 126: 106321. DOI: 10.1016/j.infsof.2020.106321
  13. 13Garousi, V, Bauer, S and Felderer, M 2020b NLP-assisted software testing: A systematic mapping of the literature. Information and Software Technology, 126: 106321. DOI: 10.1016/j.infsof.2020.106321
  14. 14Harywanto, G N, Siautama, R, Ardison, Amadea C I and Suhartono, D 2021 Extractive hotel review summarization based on tf/idf and adjective-noun pairing by considering annual sentiment trends. Procedia Computer Science, 179: 558565. DOI: 10.1016/j.procs.2021.01.040
  15. 15Hasan, M R, Maliha, M and Arifuzzaman, M 2019 Sentiment Analysis with NLP on Twitter Data. In: 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), Rajshahi, Bangladesh 2019, pp. 14. DOI: 10.1109/IC4ME247184.2019.9036670
  16. 16Hassan, S, Rafi, M and Shaikh, M S 2011 Comparing SVM and naïve Bayes classifiers for text categorization with Wikitology as knowledge enrichment. In: 2011 IEEE 14th International Multitopic Conference, Karachi, Pakistan 2011, pp. 3134. DOI: 10.1109/INMIC.2011.6151495
  17. 17Hinduja, S and Patchin, J W 2010 Bullying, cyberbullying, and suicide. Archives of Suicide Research: Official Journal of the International Academy for Suicide Research, 14(3): 206221. DOI: 10.1080/13811118.2010.494133
  18. 18Ji-Zhaxi, D, Zhi-Jie, C, Rang-Zhuoma, C, Maocuo, S and Mabao, B 2021 A corpus preprocessing method for syllable-level tibetan text classification. In: 2021 3rd International Conference on Natural Language Processing (ICNLP), Beijing, China 2021, pp. 3336. DOI: 10.1109/ICNLP52887.2021.00011
  19. 19Katzer, C, Fetchenhauer, D and Belschak, F 2009 Cyberbullying: Who are the victims? A comparison of victimization in internet chatrooms and victimization in school. Journal of Media Psychology: Theories, Methods, and Applications, 21(1): 2536. DOI: 10.1027/1864-1105.21.1.25
  20. 20Kowalski, R M, Giumetti, G W, Schroeder, A N and Lattanner, M R 2014 Bullying in the digital age: A critical review and meta-analysis of cyberbullying research among youth. Psychological Bulletin, 140(4): 10731137. DOI: 10.1037/a0035618
  21. 21Li, Q 2007 New bottle but old wine: A research of cyberbullying in schools. Computers in Human Behavior, 23(4): 17771791. DOI: 10.1016/j.chb.2005.10.005
  22. 22Li, Q 2008 A cross-cultural comparison of adolescents’ experience related to cyberbullying. Educational Research, 50(3): 223234. DOI: 10.1080/00131880802309333
  23. 23Liu, B, Xiao Y, Yu, P S, Hao, Z and Cao, L 2014 An efficient approach for outlier detection with imperfect data labels. IEEE Transactions on Knowledge and Data Engineering, 26(7): 16021616. DOI: 10.1109/TKDE.2013.108
  24. 24Mai, S D and Ngo, L T 2018 Multiple kernel approach to semi-supervised fuzzy clustering algorithm for land-cover classification. Engineering Applications of Artificial Intelligence, 68: 205213. DOI: 10.1016/j.engappai.2017.11.007
  25. 25Martens, D, Baesens, B B and Van Gestel, T 2009 Decompositional rule extraction from support vector machines by active learning. IEEE Transactions on Knowledge and Data Engineering, 21(2): 178191. DOI: 10.1109/TKDE.2008.131
  26. 26Mee, A, Homapour, E, Chiclana, F and Engel, O 2021 Sentiment analysis using TF–IDF weighting of UK MPs’ tweets on Brexit. Knowledge-Based Systems, 228: 107238. DOI: 10.1016/j.knosys.2021.107238
  27. 27Mishna, F, Cook, C, Gadalla, T, Daciuk, J and Solomon, S 2010 Cyber bullying behaviors among middle and high school students. The American Journal of Orthopsychiatry, 80(3): 362374. DOI: 10.1111/j.1939-0025.2010.01040.x
  28. 28Mitchell, T M 1997 Machine Learning. McGraw-Hill Science/Engineering/Math.
  29. 29Mokhtar, U, El Bendary, N, Hassenian, A E, Emary, E, Mahmoud, M A, Hefny, H and Tolba, M F 2015 SVM-Based detection of tomato leaves diseases. In: Filev, D, et al (eds.), Intelligent Systems 2014. Springer International Publishing. pp. 641652. DOI: 10.1007/978-3-319-11310-4_55
  30. 30Murnion, S, Buchanan, W J, Smales, A and Russell, G 2018. Machine learning and semantic analysis of in-game chat for cyberbullying. Computers & Security, 76: 197213. DOI: 10.1016/j.cose.2018.02.016
  31. 31Nasser, N, Karim, L, El Ouadrhiri, A, Ali, A and Khan, N 2021 N-Gram based language processing using Twitter dataset to identify COVID-19 patients. Sustainable Cities and Society, 72: 103048. DOI: 10.1016/j.scs.2021.103048
  32. 32Rajput, A 2020 Chapter 3—Natural language processing, sentiment analysis, and clinical analytics. In: Lytras, M D, et al (eds.), Innovation in Health Informatics. Academic Press. pp. 7997. DOI: 10.1016/B978-0-12-819043-2.00003-4
  33. 33Sheldon, P, Rauschnabel, P A and Honeycutt, J M 2019 Chapter 3—Cyberstalking and bullying. In: Sheldon, P, et al (eds.), The Dark Side of Social Media. Academic Press. pp. 4358. DOI: 10.1016/B978-0-12-815917-0.00003-4
  34. 34Shelke, P P and Pardeshi, A A 2020 Review on candidate feature extraction and categorization for unstructured text document. In: 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), Erode, India 2020, pp. 8892. DOI: 10.1109/ICCMC48092.2020.ICCMC-00017
  35. 35Shirakawa, M, Nakayama, K, Hara, T and Nishio, S 2015 Wikipedia-Based semantic similarity measurements for noisy short texts using extended naive bayes. IEEE Transactions on Emerging Topics in Computing, 3(2): 205219. DOI: 10.1109/TETC.2015.2418716
  36. 36Shouzhong, T and Minlie, H 2016 Mining microblog user interests based on TextRank with TF-IDF factor. The Journal of China Universities of Posts and Telecommunications, 23(5): 4046. DOI: 10.1016/S1005-8885(16)60056-0
  37. 37Sriyanong, W, Moungmingsuk, N and Khamphakdee, N 2018 A text preprocessing framework for text mining on big data infrastructure. In: 2018 2nd International Conference on Imaging, Signal Processing and Communication (ICISPC), Lumpur, Malaysia 2018, pp. 169173. DOI: 10.1109/ICISPC44900.2018.9006718
  38. 38Subasi, A 2020 Chapter 2—Data preprocessing. In Subasi A (ed.), Practical Machine Learning for Data Analysis Using Python. Academic Press, pp. 2789. DOI: 10.1016/B978-0-12-821379-7.00002-3
  39. 39Tan, C M, Wang, Y F and Lee, C D 2002 The use of bigrams to enhance text categorization. Information Processing & Management, 38(4): 529546. DOI: 10.1016/S0306-4573(01)00045-0
  40. 40Tao, P, Sun, Z and Sun, Z 2018 An improved intrusion detection algorithm based on GA and SVM. IEEE Access, 6: 1362413631. DOI: 10.1109/ACCESS.2018.2810198
  41. 41Wan, C, Wang, Y, Liu, Y, Ji, J and Feng, G 2019 Composite feature extraction and selection for text classification. IEEE Access, 7: 3520835219. DOI: 10.1109/ACCESS.2019.2904602
  42. 42Wang, Y, Liu, S, Afzal, N, Rastegar-Mojarad, M, Wang, L, Shen, F, Kingsbury, P and Liu, H 2018 A comparison of word embeddings for the biomedical natural language processing. Journal of Biomedical Informatics, 87: 1220. DOI: 10.1016/j.jbi.2018.09.008
  43. 43Wang, T, Lu, K, Chow, K P and Zhu, Q 2020 COVID-19 Sensing: Negative sentiment analysis on social media in china via bert model. IEEE Access, 8: 138162138169. DOI: 10.1109/TNNLS.2014.2382123
  44. 44Wu, J and Yang, H 2015 Linear regression-based efficient SVM learning for large-scale classification. IEEE Transactions on Neural Networks and Learning Systems, 26(10): 23572369. DOI: 10.1109/TNNLS.2014.2382123
  45. 45Xiong, A, Liu, D, Tian, H, Liu, Z, Yu, P and Kadoch, M 2021 News keyword extraction algorithm based on semantic clustering and word graph model. Tsinghua Science and Technology, 26(6): 886893. DOI: 10.26599/TST.2020.9010051
  46. 46Yu, L, Gan, S, Chen, Y and He, M 2020 Correlation-Based Weight Adjusted Naive Bayes. IEEE Access, 8: 5137751387. DOI: 10.1109/ACCESS.2020.2973331
  47. 47Zhao, Z, Zheng, P, Xu, S and Wu, X 2019 Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems, 30(11): 32123232. DOI: 10.1109/TNNLS.2018.2876865
  48. 48Zinovyeva, E, Härdle, W K and Lessmann, S 2020 Antisocial online behavior detection using deep learning. Decision Support Systems, 138: 113362. DOI: 10.1016/j.dss.2020.113362
Language: English
Submitted on: Jun 8, 2023
Accepted on: May 6, 2024
Published on: May 23, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Yudi Setiawan, Nur Ulfa Maulidevi, Kridanto Surendro, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.