Enhancing human activity recognition with multi-head self-attention and stacked autoencoders
Abstract
Human activity recognition (HAR) is a critical task in healthcare and behavioral monitoring systems, often reliant on accurate interpretation of time-series sensor data. This paper proposes a novel deep learning (DL) framework, multi-head self-attention enhanced stacked autoencoder (MHSA-SAE), designed to improve the representation and classification of human activities using the mobile health (mHealth) HAR dataset. The architecture integrates stacked autoencoders (SAE) for unsupervised feature extraction with a multi-head self-attention (MHSA) mechanism to capture long-range temporal dependencies across multiple sensor modalities. Experimental evaluations demonstrate that the proposed model significantly outperforms existing approaches, achieving 97.82% accuracy, 96.67% F1-score, and 98.10% AUC. Detailed class-wise metrics highlight the model's robustness across diverse activity types such as standing, walking, running, and climbing stairs. Comparative analysis with state-of-the-art methods and ablation studies further confirms the effectiveness of each architectural component. The MHSA-SAE framework presents a promising direction for real-time, high-accuracy activity recognition in mobile and healthcare applications.
© 2026 S. Anandanarayanan, S. Thirumaran, published by Macquarie University, Australia
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.