A Baseline for Violence Behavior Detection in Complex Surveillance Scenarios

Yingying Long; Zongxin Wang; Hanzhu Wei; Xiaojun Bai

doi:10.2478/ijanmc-2024-0036

.blurhash-client-img { display: none !important; }

A Baseline for Violence Behavior Detection in Complex Surveillance Scenarios

International Journal of Advanced Network, Monitoring and Controls

Volume 9 (2024): Issue 4 (December 2024)

By: Yingying Long, Zongxin Wang, Hanzhu Wei and Xiaojun Bai

Open Access

|Dec 2024

Abstract

Violence detection can improve the ability to deal with emergencies, but there is still no data set specifically for violence detection. In this work, we propose VioData, a datasets specialized for detection in complex surveillance scenarios, and to more accurately assess the efficacy of these datasets, we propose a violence detection model based on target detection and 3D convolution. The model consists of two key modules: spatio-temporal feature extraction module and spatiotemporal feature fusion module. Among them, the spatio-temporal feature extraction module consists of a spatial feature module that extracts key frames using ordinary convolutional networks and a temporal feature extraction module that establishes temporal features using 3D convolution. The spatio-temporal feature fusion module Channel Fusion and Attention Mechanism (CFAM) fuses the temporal and spatial features. The experimental results indicate that the precision of the suggested detection model on UCF101-24, JHMDB behavioral detection datasets, and our proposed violence detection datasets, VioData, is improved compared to other violence detection models, which not only verifies the validity of the datasets, but also provides a baseline for the subsequent research and improvement in this area.

References

Soomro K, Zamir A R, Shah M. UCF101: A Datasets of 101 Human Actions Classes from Videos in The Wild [J]. Computer Science, 2012.DOI: 10.48550/arXiv.1212.0402.
Open DOI Search in Google Scholar Back to article
Jhuang H, Gall J, Zuffi S, et al. Towards understanding action recognition [C] //IEEE International Conference on Computer Vision. IEEE, 2014. DOI: 10.1109/ICCV.2013.396.
Open DOI Search in Google Scholar Back to article
Wishart D S, Djoumbou F Y, Ana M, et al. HMDB 4.0: the human metabolome database for 2018 [J]. Nucleic Acids Research, 2017(D1): D1.DOI: 10.1093/nar/gkx1089.
Open DOI Search in Google Scholar Back to article
Kay W, Carreira J, Simonyan K, et al. The Kinetics Human Action Video datasets [J]. 2017.DOI: 10.48550/arXiv.1705.06950.
Open DOI Search in Google Scholar Back to article
Xu Long, Gong Chen, Yang Jie, et al. Violent video detection based on mosift feature and sparse coding [C] //2014 IEEE International Conference on Acoustics, Speech and Signal Processing, 2014:3538-3542.
Search in Google Scholar Back to article
Febin I P, Jayasree K, Joy P T. Violence detection in videos for an intelligent surveillance system using MoBSIFT and movement filtering algorithm [J]. Pattern Analysis and Applications, 2020, 23(2):611-623.
Search in Google Scholar Back to article
Sudhakaran S, Lanz O. Learning to Detect Violent Videos using Convolutional Long Short-Term Memory[C]. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2017:33–34.
Search in Google Scholar Back to article
Liang Qicheng, Li Yong, Yang Kaikai, et al. Long-term recurrent convolutional network violent Behaviour recognition with attention mechanism [J]. MATEC Web of Conferences, 2021, 336 (1): 5013.
Search in Google Scholar Back to article
Feichtenhofer C, Fan Haoqi, Malik J, et al. SlowFast Networks for Video Recognition [C] //Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6202-6211.
Search in Google Scholar Back to article
Okan Köpüklü, Wei Xiangyu, Rigoll G. You Only Watch Once: A Unified CNN Architecture for RealTime Spatiotemporal Action Localization [J]. arXiv preprint arXiv:1911. 06644, 2019.
Search in Google Scholar Back to article
Li Hongchang, Wang Jing, Han Jianjun, et al. A novel multi-stream method for violent interaction detection using deep learning [J]. Measurement and Control, 2020, 53(5):796-806.
Search in Google Scholar Back to article
Islam Z, Rukonuzzaman M, Ahmed R, et al. Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM [C] //2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021: 1-8.
Search in Google Scholar Back to article
Carreira J, Zisserman A Quo Vadis, Action Recognition? A New Model and the Kinetics datasets [J]. IEEE, 2017. DOI: 10.1109/CVPR.2017.502.
Open DOI Search in Google Scholar Back to article
Direkoglu C. Abnormal Crowd Behavior Detection Using Motion Information Images and Convolutional Neural Networks [J]. IEEE Access, 2020, PP (99): 1-1. DOI: 10.1109/ACCESS.2020.2990355.
Open DOI Search in Google Scholar Back to article
Dong Min, Fang Zhenglin, Li Yongfa, et al. AR3D: Attention Residual 3D Network for Human Action Recognition [J]. Sensors, 2021, 21(5):1656-1669.
Search in Google Scholar Back to article
Li Zhan. Research on Video Violence Detection Algorithm Based on 3D Convolutional Neural Network [D]. Anhui University of Architecture, 2022. DOI: 10.27784/d.cnki.gahjz.2022.000160.
Open DOI Search in Google Scholar Back to article
XU Pengfei, ZHANG Pengchao, LIU Yaheng, et al. A human behavior detection algorithm based on SR3D network [J]. Computer Knowledge and Technology, 2022, 18(01):10-11. DOI: 10.14004/j.cnki.ckt.2022.0068.
Open DOI Search in Google Scholar Back to article
Sanghyun Woo, Jongchan Park, Joon-Young Lee,In SoKweon. CBAM: Convolutional Block Attention Module. 2018.
Search in Google Scholar Back to article
Wang C Y, Liao H Y M, Wu Y H, et al. CSPNet: A New Backbone that can Enhance Learning Capability of CNN [C] //2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2020. DOI: 10.1109/CVPRW50498.2020.00203.
Open DOI Search in Google Scholar Back to article
Lim B, Ark S, Loeff N, et al. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting [J]. International Journal of Forecasting, 2021(1). DOI: 10.1016/j.ijforecast.2021.03.012.
Open DOI Search in Google Scholar Back to article
Alwando E, Yie-Tarng Chen, Wen-Hsien. CNN-Based Multiple Path Searchfor Action Tube Detection in Videos [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 30 (1): 104-116.
Search in Google Scholar Back to article
Wei Jiangchuan, Wang Hanli, Yi Yun, et al. P3D-CTN: Pseudo-3D Convolutional Tube Network for SpatioTemporal Action Detection in Videos [C] //2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019: 300-304.
Search in Google Scholar Back to article
Yang Xitong, Yang Xiaodong, Liu Mingyu, et al. STEP: Spatio-Temporal Progressive Learning for Video Action Detection [C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 264-272.
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/ijanmc-2024-0036 | Journal eISSN: 2470-8038

Journal RSS Feed

Language: English

Page range: 48 - 58

Published on: Dec 31, 2024

Published by: Xi’an Technological University

In partnership with: Paradigm Publishing Services

Publication frequency: 4 issues per year

Keywords:

Violent Behavior Detection,

Datasets,

Spatio-temporal Feature,

Target Detection,

Feature Fusion

Related subjects:

Computer sciences,

Computer sciences, other

© 2024 Yingying Long, Zongxin Wang, Hanzhu Wei, Xiaojun Bai, published by Xi’an Technological University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 9 (2024): Issue 4 (December 2024)