Have a personal or library account? Click to login
AG-HybridNet: An Attention-Guided Hybrid CNN-Transformer Network for 3D Gaze Estimation Cover

AG-HybridNet: An Attention-Guided Hybrid CNN-Transformer Network for 3D Gaze Estimation

By: Yue Li and  Changyuan Wang  
Open Access
|Dec 2025

References

  1. Strazdas D, Hintz J, Al-Hamadi A. Robo-hud: Interaction concept for contactless operation of industrial cobotic systems [J]. Applied Sciences, 2021, 11(12): 5366.
  2. Zhang X C. Sugano Y, Fritz M, et al. It's written all over your face: Full-face appearance-based gaze estimation[C]//Proceedings of the 2017 IEEE conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 51-60.
  3. [Kellnhofer P, Recasens A, Stent S, et al. Gaze360: Physically unconstrained gaze estimation in the wild[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6912-6921.
  4. Abdelrahman A A, Hempel T, Khalifa A, et al. L2CS-Net: Fine-grained gaze estimation in unconstrained environments[C]// Proceedings of the 8th International Conference on Frontiers of Signal Processing. Corfu: IEEE, 2023: 98-102.
  5. Liu J H, Chi J N, Hu W X, et al. 3D model-based gaze tracking via iris features with a single camera and a single light source[J]. IEEE Transactions on Human-Machine Systems, 2020, 51(2): 75-86.
  6. Liu S, Liu D P, Wu H Y. Gaze estimation with multi-scale channel and spatial attention[C]// Proceedings of the 2020 9th International Conference on Computing and Pattern Recognition, 2020: 303-309.
  7. Liu S L, Li F, Zhang H, et al. DAB-DETR: Dynamic anchor boxes are better queries for DETR[J]. arXiv preprint arXiv:2201.12329, 2022.
  8. Liu Y F, Wang T C, Zhang X Y, et al. PETR: Position embedding transformation for multi-view 3D object detection[C]// European Conference on Computer Vision. Springer, 2022: 531-548.
  9. VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2017: 6000-6010.
  10. DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[C]// International Conference on Learning Representations. 2021.
  11. WANG W, XIE E, LI X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 568-578.
  12. WU H, XIAO B, CODella N, et al. CvT: Introducing convolutions to vision transformers[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 22-31.
  13. CAO J, PANG Y, ANWER R M, et al. PSTR: End-to-end one-step person search with transformers[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 9448-9457.
  14. XUE F, WANG Q, GUO G. Transfer: Learning relation-aware facial expression representations with transformers[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 3581-3590.
  15. Onyemauche U C, Nkwo S M, Mbanusi C E, et al. Towards the Use of Eye Gaze Tracking Technology: Human Computer Interaction (HCI) Research[C]// AfriCHI 2021: 3rd African Human-Computer Interaction Conference. 2021.
  16. Zhang X C, Sugano Y, Fritz M, et al. Appearance-based gaze estimation in the wild[C]// Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2021: 4511-4520.
  17. Cheng Y H, Lu F, Zhang X C. Appearance-based gaze estimation via evaluation-guided asymmetric regression[C]// Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018: 100-115.
  18. Krafka K, Khosla A, Kellnhofer P, et al. Eye tracking for everyone[C]// Proceedings of the 2023 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2023: 2176-2184.
  19. Chen Z K, Shi B E. Appearance-based gaze estimation using dilated-convolutions[C]// Proceedings of the 14th Asian Conference on Computer Vision. Perth: Springer, 2019: 309-324.
  20. Zhang X C, Sugano Y, Fritz M, et al. MPIIGaze: Real world dataset and deep appearance-based gaze estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(1): 162-175.
  21. Cheng Y H, Huang S Y, Wang F, et al. A coarse-to-fine adaptive network for appearance-based gaze estimation[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York: AAAI Press, 2020: 10623-10630.
  22. Murthy L R D, Biswas P. Appearance-based gaze estimation using attention and difference mechanism[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 3143-3152.
  23. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]// Proceedings of the 9th International Conference on Learning Representations. OpenReview.net, 2021.
  24. Cheng Y H, Bao Y W, Lu F. PureGaze: Purifying gaze feature for generalizable gaze estimation[C]// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2022: 436-443.
  25. Li Y J, Chen J H, Ma J X, et al. Gaze estimation based on convolutional structure and sliding window-based attention mechanism[J]. Sensors, 2023, 23(13): 6226.
  26. Wang X H, Zhou J, Wang L, et al. BoT2L-Net: Appearance-based gaze estimation using bottleneck Transformer block and two identical losses in unconstrained environments[J]. Electronics, 2023, 12(7): 1704.
  27. Fischer T, Chang H J, Demiris Y. RT-GENE: Real-time eye gaze estimation in natural environments[C]// Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018: 334-352. Conference on Computer Vision. Seoul: IEEE, 2019. 6912–6921.
  28. Li, Y. J.; Chen, J. H.; Ma, J. X.; et al. Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism. Sensors 2023, 23(13), 6226.
  29. Zhang Xucong, Sugano Y, Fritz M, et al. Mpiigaze: Real-world dataset and deep appearance-based gaze estimation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 41(1): 162-175.
  30. Kellnhofer P, Recasens A, Stent S, et al. Gaze360: Physically unconstrained gaze estimation in the wild[C]// Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6912-6921.
Language: English
Page range: 82 - 93
Published on: Dec 31, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Yue Li, Changyuan Wang, published by Xi’an Technological University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.