Position-Encoding Convolutional Network to Solving Connected Text Captcha

Qing, Ke; Zhang, Rong

doi:10.2478/jaiscr-2022-0008

Abstract

Text-based CAPTCHA is a convenient and effective safety mechanism that has been widely deployed across websites. The efficient end-to-end models of scene text recognition consisting of CNN and attention-based RNN show limited performance in solving text-based CAPTCHAs. In contrast with the street view image and document, the character sequence in CAPTCHA is non-semantic. The RNN loses its ability to learn the semantic context and only implicitly encodes the relative position of extracted features. Meanwhile, the security features, which prevent characters from segmentation and recognition, extensively increase the complexity of CAPTCHAs. The performance of this model is sensitive to different CAPTCHA schemes. In this paper, we analyze the properties of the text-based CAPTCHA and accordingly consider solving it as a highly position-relative character sequence recognition task. We propose a network named PosConv to leverage the position information in the character sequence without RNN. PosConv uses a novel padding strategy and modified convolution, explicitly encoding the relative position into the local features of characters. This mechanism of PosConv makes the extracted features from CAPTCHAs more informative and robust. We validate PosConv on six text-based CAPTCHA schemes, and it achieves state-of-the-art or competitive recognition accuracy with significantly fewer parameters and faster convergence speed.

References

[1] Darko Brodić, Alessia Amelio, Nadeem Ahmad, and Syed Khuram Shahzad. Usability analysis of the image and interactive captcha via prediction of the response time. In International Workshop on Multi-disciplinary Trends in Artificial Intelligence, pages 252–265. Springer, 2017.10.1007/978-3-319-69456-6_21
Search in Google Scholar Back to article
[2] Elie Bursztein, Jonathan Aigrain, Angelika Moscicki, and John C Mitchell. The end is nigh: Generic solving of text-based captchas. In 8th {USENIX} Workshop on Offensive Technologies ({WOOT} 14), 2014.
Search in Google Scholar Back to article
[3] Elie Bursztein, Matthieu Martin, and John Mitchell. Text-based captcha strengths and weaknesses. In Proceedings of the 18th ACM conference on Computer and communications security, pages 125–138, 2011.10.1145/2046707.2046724
Search in Google Scholar Back to article
[4] Kumar Chellapilla, Kevin Larson, Patrice Y Simard, and Mary Czerwinski. Computers beat humans at single character recognition in reading based human interaction proofs (hips). In Conference on Email and Anti-Spam (CEAS), pages 1–8, 2005.10.1145/1054972.1055070
Search in Google Scholar Back to article
[5] Chen Duan, Rong Zhang, and Ke Qing. Feature refine network for text-based captcha recognition. In International Conference on Image and Graphics, pages 64–73. Springer, 2019.10.1007/978-3-030-34110-7_6
Search in Google Scholar Back to article
[6] Ian J. Goodfellow and Yaroslav Bulatov and Julian Ibarz and Sacha Arnoud and Vinay Shet, Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks, 1312.6082, 2014.
Search in Google Scholar Back to article
[7] Ahmad Salah El Ahmad, Jeff Yan, and Lindsay Marshall. The robustness of a new captcha. In Proceedings of the Third European Workshop on System Security, pages 36–41, 2010.10.1145/1752046.1752052
Search in Google Scholar Back to article
[8] Haichang Gao, Mengyun Tang, Yi Liu, Ping Zhang, and Xiyang Liu. Research on the security of microsoft’s two-layer captcha. IEEE Transactions on Information Forensics and Security, 12(7):1671–1685, 2017.10.1109/TIFS.2017.2682704
Search in Google Scholar Back to article
[9] Haichang Gao, Jeff Yan, Fang Cao, Zhengya Zhang, Lei Lei, Mengyun Tang, Ping Zhang, Xin Zhou, Xuqin Wang, and Jiawei Li. A simple generic attack on text captchas. In The Network and Distributed System Security Symposium (NDSS), pages 1–14, 2016.
Search in Google Scholar Back to article
[10] Md Amirul Islam, Sen Jia, and Neil D. B. Bruce. How much position information do convolutional neural networks encode?, 2020.
Search in Google Scholar Back to article
[11] Rosanne Liu, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank, Alex Sergeev, and Jason Yosinski. An intriguing failing of convolutional neural networks and the coordconv solution, 2018.
Search in Google Scholar Back to article
[12] Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, and Xiang Bai. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV), pages 67–83, 2018.
Search in Google Scholar Back to article
[13] Rabih Al Nachar, Elie Inaty, Patrick J Bonnin, and Yasser Alayli. Breaking down captcha using edge corners and fuzzy logic segmentation/recognition technique. Security and Communication Networks, 8(18):3995–4012, 2015.10.1002/sec.1316
Search in Google Scholar Back to article
[14] Liang Qiao, Ying Chen, Zhanzhan Cheng, Yunlu Xu, Yi Niu, Shiliang Pu, and Fei Wu. Mango: A mask attention guided one-stage scene text spotter, 2020.10.1609/aaai.v35i3.16348
Search in Google Scholar Back to article
[15] Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. Dynamic routing between capsules, 2017.
Search in Google Scholar Back to article
[16] Mengyun Tang, Haichang Gao, Yang Zhang, Yi Liu, Ping Zhang, and Ping Wang. Research on deep learning techniques in breaking text-based captchas and designing image-based captcha. IEEE Transactions on Information Forensics and Security, 13(10):2522–2537, 2018.10.1109/TIFS.2018.2821096
Search in Google Scholar Back to article
[17] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
Search in Google Scholar Back to article
[18] Luis Von Ahn, Manuel Blum, and John Langford. Telling humans and computers apart automatically. Communications of the ACM, 47(2):56–60, 2004.10.1145/966389.966390
Search in Google Scholar Back to article
[19] Zbigniew Wojna, Alexander N Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu, Yeqing Li, and Julian Ibarz. Attention-based extraction of structured information from street view imagery. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 1, pages 844–850. IEEE, 2017.10.1109/ICDAR.2017.143
Search in Google Scholar Back to article
[20] Jeff Yan and Ahmad Salah El Ahmad. A low-cost attack on a microsoft captcha. In Proceedings of the 15th ACM conference on Computer and communications security, pages 543–554, 2008.10.1145/1455770.1455839
Search in Google Scholar Back to article
[21] Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, Xiaojiang Chen, Jungong Han, and Zheng Wang. Using generative adversarial networks to break and protect text captchas. ACM Transactions on Privacy and Security (TOPS), 23(2):1–29, 2020.10.1145/3378446
Search in Google Scholar Back to article
[22] Guixin Ye, Zhanyong Tang, Dingyi Fang, Zhanxing Zhu, Yansong Feng, Pengfei Xu, Xiaojiang Chen, and Zheng Wang. Yet another text captcha solver: A generative adversarial network based approach. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 332–348, 2018.
Search in Google Scholar Back to article
[23] Yang Zi, Haichang Gao, Zhouhang Cheng, and Yi Liu. An end-to-end attack on text captchas. IEEE Transactions on Information Forensics and Security, 15:753–766, 2019.10.1109/TIFS.2019.2928622
Search in Google Scholar Back to article

Position-Encoding Convolutional Network to Solving Connected Text Captcha

Abstract

Paradigm

My account