TextGuard: Identifying and neutralizing adversarial threats in textual data

Omar, Marwan; Albtosh, Luay

doi:10.2478/ijmce-2025-0028

Abstract

Adversarial attacks in the text domain pose significant challenges to the integrity of Natural Language Processing (NLP) systems. Addressing this, our study introduces “TextGuard,” a groundbreaking technique utilizing the Local Outlier Factor (LOF) algorithm for detecting adversarial examples in NLP. This study not only empirically validates the effectiveness of TextGuard on various real-world datasets but also compares its performance with traditional NLP classifiers such as Long Short-Term Memory (LSTM), Convolutional Nueral Nets (CNN), and transformer-based models. Remarkably, TextGuard demonstrates superior detection capabilities with F1 detection accuracy scores reaching up to 94.8%, outperforming recent state-of-the-art methods like Discriminative Perturbations (DISP) and Frequency Guided Word Substitution (FGWS). This marks the first instance of applying the LOF technique in the text domain for adversarial example detection, setting a new benchmark in the field.

References

Goodfellow I.J., Shlens J., Szegedy C., Explaining and harnessing adversarial examples, arXiv:1412.6572, 2014.
Search in Google Scholar Back to article
Farabet C., Couprie C., Najman L., LeCun Y., Learning hierarchical features for scene labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1915–1929, 2013.
Search in Google Scholar Back to article
Jin D., Jin Z., Zhou J.T., Szolovits P., Is bert really robust? a strong baseline for natural language attack on text classification and entailment, Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 8018–8025, 2020.
Search in Google Scholar Back to article
Madry A., Makelov A., Schmidt L., Tsipras D., Vladu A., Towards deep learning models resistant to adversarial attacks, arXiv:1706.06083, 2017.
Search in Google Scholar Back to article
Mozes M., Stenetorp P., Kleinberg B., Griffin L.G., Frequency-guided word substitutions for detecting textual adversarial examples, arXiv:2004.05887, 2020.
Search in Google Scholar Back to article
Mrkšić N., Séaghdha D.Ó., Thomson B., Gašić M., Rojas-Barahona L., Su P.H., Vandyke D., Wen T.H., Young S., Counter-fitting word vectors to linguistic constraints, arXiv:1603.00892, 2016.
Search in Google Scholar Back to article
Zhang H., Yu Y., Jiao J., Xing E., Ghaoui L.E., Jordan M., Theoretically principled trade-off between robustness and accuracy, Proceedings of the 36th International Conference on Machine Learning (PMLR), 97, 7472–7482, 2019.
Search in Google Scholar Back to article
Gholami S., Omar M., Can a student large language model perform as well as it’s teacher?, arXiv:2310.02421, 2023.
Search in Google Scholar Back to article
Omar M., Machine Learning for Cybersecurity: Innovative Deep Learning Solutions, Springer, USA, 2022.
Search in Google Scholar Back to article
Omar M., VulDefend: A novel technique based on pattern-exploiting training for detecting software vulnerabilities using language models, 2023 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), 287–293, 2023.
Search in Google Scholar Back to article
Omar M., Choi S., Nyang D., Mohaisen D., Robust natural language processing: Recent advances, challenges, and future directions, arXiv:2201.00768, 2022.
Search in Google Scholar Back to article
Omar M., Jones R., Burrell D.N., Dawson M., Nobles C., Mohammed D., Bashir A.K., Harnessing the Power and Simplicity of Decision Trees to Detect IoT Malware, IGI Global, USA, 2023.
Search in Google Scholar Back to article
Omar M., Sukthankar G., Text-defend: Detecting adversarial examples using local outlier factor, 2023 IEEE 17th International Conference on Semantic Computing (ICSC), 01–03 February 2023, California, USA.
Search in Google Scholar Back to article
Sun G., Su Y., Qin C., Xu W., Lu X., Ceglowski A., Complete defense framework to protect deep neural networks against adversarial examples, Mathematical Problems in Engineering, 2020(1), 8319249, 2020.
Search in Google Scholar Back to article
Tsipras D., Santurkar S., Engstrom L., Turner, A., Madry A., Robustness may be at odds with accuracy, arXiv:1805.12152, 2018.
Search in Google Scholar Back to article
Abbasi R., Bashir A.K., Mateen A., Amin F., Ge Y., Omar M., Efficient security and privacy of lossless secure communication for sensor-based urban cities, IEEE Sensors Journal, 24(5), 5549–5560, 2024.
Search in Google Scholar Back to article
Ahmed A., Rasheed H., Bashir A.K., Omar M., Millimeter-wave channel modeling in a VANETs using coding techniques, PeerJ Computer Science, 9:e1374, 1–28, 2023.
Search in Google Scholar Back to article
Kinoon M.A., Omar M., Mohaisen M., Mohaisen D., Security breaches in the healthcare domain: a spatiotemporal analysis, Computational Data and Social Networks: 10th International Conference, 13, 15–17, 2021.
Search in Google Scholar Back to article
Arulappan A., Raja G., Bashir A.K., Mahanti A., Omar M., ZTPM: Zero touch management provisioning algorithm for the on-boarding of cloud-native virtual network functions, Mobile Networks and Applications, 1–13, 2024.
Search in Google Scholar Back to article
Burrell D.N., Nobles C., Richardson K., Wright, Jones A.J., Springs D., Brown-Jackson K., Applied Research Approaches to Technology Healthcare and Business, IGI Global, USA, 2023.
Search in Google Scholar Back to article
Jones E., Jia R., Raghunathan A., Liang P., Robust encodings: A framework for combating adversarial typos, arXiv:2005.01229, 2020.
Search in Google Scholar Back to article
Keller Y., Mackensen J., Eger S., BERT-Defense: A probabilistic model based on BERT to combat cognitively inspired orthographic adversarial attacks, arXiv:2106.01452, 2021.
Search in Google Scholar Back to article
Zhou Y., Jiang J.Y., Chang K.W., Wang W., Learning to discriminate perturbations for blocking adversarial attacks in text classification, arXiv:1909.03084, 2019.
Search in Google Scholar Back to article
Li D., Zhang Y., Peng H., Chen L., Brockett C., Sun M.T., Dolan B., Contextualized perturbation for textual adversarial attack, arXiv:2009.07502, 2020.
Search in Google Scholar Back to article
Pruthi D., Dhingra B., Lipton Z.C., Combating adversarial misspellings with robust word recognition, arXiv preprint arXiv:1905.11268, 2019.
Search in Google Scholar Back to article
Wang W., Wang R., Ke J., Wang L., TextFirewall: Omni-defending against adversarial texts in sentiment classification, IEEE Access, 9, 27467–27475, 2021.
Search in Google Scholar Back to article
Wang X., Yang Y., Deng Y., He K., Adversarial training with fast gradient projection method against synonym substitution based text attacks, Proceedings of the AAAI Conference on Artificial Intelligence, 35(16), 13997–14005, 2021.
Search in Google Scholar Back to article
Sakaguchi K., Post M., Durme B.V., Grammatical error correction with neural reinforcement learning, arXiv:1707.00299, 2017.
Search in Google Scholar Back to article
Alshawabkeh M., Jang B., Kaeli D., Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, 104–110, 2010.
Search in Google Scholar Back to article
Bai M., Wang X., Xin J., Wang G., An efficient algorithm for distributed density-based outlier detection on big data, Neurocomputing, 181, 19–28, 2016.
Search in Google Scholar Back to article
Lozano E., Acufia E., Parallel algorithms for distance-based and density-based outliers, Fifth IEEE International Conference on Data Mining (ICDM’05), 1–4, 2005.
Search in Google Scholar Back to article
Cheng Z., Zou C., Dong J., Outlier detection using isolation forest and local outlier factor, Proceedings of the Conference on Research in Adaptive and Convergent Systems, 161–168, 2019.
Search in Google Scholar Back to article
Mika S., Schölkopf B., Smola A., Müller K.R., Scholz M., Rätsch G., Kernel PCA and De-noising in feature spaces, Advances in Neural Information Processing Systems, 11, 1998.
Search in Google Scholar Back to article
Topal M.O., Bas A., Heerden I.V., Exploring transformers in natural language generation: GPT, BERT, and XLNet, arXiv:2102.08036, 2021.
Search in Google Scholar Back to article
Ma X., Jin R., Paik J.Y., Chung T.S., Lecture Notes in Electrical Engineering (Chapter: Large scale text classification with efficient word embedding), International Conference on Mobile and Wireless Technology, 465–469, Springer, 2017.
Search in Google Scholar Back to article
Graves A., Supervised sequence labelling with recurrent neural networks, Springer, USA, 2012.
Search in Google Scholar Back to article
Wang T., Wang X., Qin Y., Packer B., Li K., Chen J., Beutel A., Chi E., CAT-Gen: Improving robustness in NLP models via controlled adversarial text generation, arXiv:2010.02338, 2020.
Search in Google Scholar Back to article
Morris J.X., Liand E., Lanchantin J., Ji Y., Qi Y., Reevaluating adversarial examples in natural language, arXiv:2004.14174, 2020.
Search in Google Scholar Back to article
Morris J.X., Liand E., Yoo J.Y., Grigsby J., Jin D., Qi Y., TextAttack: A framework for adversarial attacks, data augmentation and adversarial training in NLP, arXiv:2005.05909, 2020.
Search in Google Scholar Back to article
Asghar N., Yelp dataset challenge: Review rating prediction, arXiv:1605.05362, 2016.
Search in Google Scholar Back to article
Poth C., Pfeiffer J., Rücklé, Gurevych I., What to pre-train on? Efficient intermediate task selection, arXiv:2104.08247, 2021.
Search in Google Scholar Back to article
Zhang X., Zhao J., LeCun Y., Character-level convolutional networks for text classification, Advances in Neural Information Processing Systems, 28, 2015.
Search in Google Scholar Back to article
Pennington J., Socher R., Manning C., Glove: Global vectors for word representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543, 2014.
Search in Google Scholar Back to article
Kingma D.P., Ba J., Adam: A method for stochastic optimization, arXiv:1412.6980, 2014.
Search in Google Scholar Back to article
Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P, Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau d., Brucher M., Perrot M., Duchesnay É., Scikit-learn: Machine learning in python, Journal of Machine Learning Research, 12, 2825–2830, 2011.
Search in Google Scholar Back to article
Gao J., Lanchantin J., Soffa M.L., Qi Y., Black-box generation of adversarial text sequences to evade deep learning classifiers, In 2018 IEEE Security and Privacy Workshops (SPW), 50–56, 2018.
Search in Google Scholar Back to article

TextGuard: Identifying and neutralizing adversarial threats in textual data

Abstract

Paradigm

My account