Have a personal or library account? Click to login
AG-HybridNet: An Attention-Guided Hybrid CNN-Transformer Network for 3D Gaze Estimation Cover

AG-HybridNet: An Attention-Guided Hybrid CNN-Transformer Network for 3D Gaze Estimation

By: Yue Li and  Changyuan Wang  
Open Access
|Dec 2025

Abstract

—To address the challenge of accurate gaze estimation in unconstrained environments susceptible to various interfering factors, this paper proposes AG-HybridNet, an end-to-end gaze estimation model integrating a dual-branch architecture combining CNN and Transformer components. The model employs Swin Transformer as the backbone for global feature extraction while incorporating an enhanced CNN branch dedicated to local feature capture. We introduce the TDConv-Block, which replaces standard convolution with partial convolution integrated with reparameterization technique, significantly reducing computational load and memory access while forming a T-shaped receptive field focused on central facial regions. Additionally, we design Efficient Additive Attention (ED-Attention) that effectively resolves the computational bottleneck in long-sequence processing for Transformers by reconstructing the computational workflow. Comprehensive experiments on MPIIFaceGaze and Gaze360 datasets validate the model's effectiveness. Experimental results demonstrate that AG-HybridNet achieves mean angular errors of 3.72° and 10.82° on MPIIFaceGaze and Gaze360 datasets respectively. Comparative studies with other mainstream 3D gaze estimation methods confirm that our network model can accurately estimate 3D gaze directions while reducing computational complexity.

Language: English
Page range: 82 - 93
Published on: Dec 31, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Yue Li, Changyuan Wang, published by Xi’an Technological University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.