Have a personal or library account? Click to login
A Baseline for Violence Behavior Detection in Complex Surveillance Scenarios Cover

A Baseline for Violence Behavior Detection in Complex Surveillance Scenarios

Open Access
|Dec 2024

Figures & Tables

Figure 1.

Illustration of a sample of labeled acts of violence
Illustration of a sample of labeled acts of violence

Figure 2.

The violence detection algorithm's framework is displayed in Fig.2. The model for extracting spatio-temporal features and the spatio-temporal feature fusion module make up the majority of the framework. The spatio-temporal feature extraction model is composed of the temporal feature extraction model and the spatial feature extraction module, and the I3D network is the network structure of the temporal feature extraction model, as illustrated in (a); (b)(c) are the 3D-CBAM Attention Mechanism and 3D Inception (3D Inc) module, respectively. The Atrous Spatial Pyramid Pooling (ASPP) model is added at the end of the spatial feature extraction model, which has the CSPDarkNet-Tiny network as its network structure, which is shown in (d), where rate denotes the expansion rate of the null convolution. atrous Spatial Pyramid Pooling (ASPP) has five branches, including one ordinary convolutional branch, three null convolutional branches, and one global average pooling branch; (e) shows the overall structure of Channel Fusion and Attention Mechanism(CFAM); D is the final output feature map of CFAM, and C1 and C2 are the number of feature map output channels for the I3D network and the ASP module, respectively.
The violence detection algorithm's framework is displayed in Fig.2. The model for extracting spatio-temporal features and the spatio-temporal feature fusion module make up the majority of the framework. The spatio-temporal feature extraction model is composed of the temporal feature extraction model and the spatial feature extraction module, and the I3D network is the network structure of the temporal feature extraction model, as illustrated in (a); (b)(c) are the 3D-CBAM Attention Mechanism and 3D Inception (3D Inc) module, respectively. The Atrous Spatial Pyramid Pooling (ASPP) model is added at the end of the spatial feature extraction model, which has the CSPDarkNet-Tiny network as its network structure, which is shown in (d), where rate denotes the expansion rate of the null convolution. atrous Spatial Pyramid Pooling (ASPP) has five branches, including one ordinary convolutional branch, three null convolutional branches, and one global average pooling branch; (e) shows the overall structure of Channel Fusion and Attention Mechanism(CFAM); D is the final output feature map of CFAM, and C1 and C2 are the number of feature map output channels for the I3D network and the ASP module, respectively.

Figure 3.

CSPDarkNet-Tiny Network Overall Structure
CSPDarkNet-Tiny Network Overall Structure

Figure 4.

Violence detection results
Violence detection results

Detection results of 3D-CBAM attention model embedded at different locations

NetworkEmbedding positionUCF101-24JHMDBVioData
mAP
-84.4%80.4%86.5%
3D Inc_186.1%83.7%89.0%
3D Inc_286.7%83.3%88.3%
3D Inc_385.9%84.2%89.6%
3D Inc_1+3D Inc_288.2%87.5%90.7%
I3D3D Inc_1+3D Inc_389.8%88.6%91.8%
3D Inc_2+3D Inc_388.0%88.0%91.4%
3D Inc_1+3D Inc_2+3D Inc_390.0%88.7%92.0%

Parameter settings in network training

ParameterSetting
Initial Learning Rate0.001
Epoch230
ReSize(416,416)
ReSize(416,416)
Weight Decay0.0005
OptimizerAdam

Results of violence detection accuracy of different models

MethodUCF101-24JHMDBVioData
mAP
MPS82.4%-85.3
P3D-CTN-84.0%84.9%
STEP83.1%-86.4%
YOWO82.5%85.7%88.0%
ours89.8%88.6%91.8%

Detection results with embedded ASPP model and introduction of spatio-temporal depth separable convolution

NetworkUCF101-24JHMDBVioData
mAP
Baseline78.5%75.3%78.9%
CSPDarkNet-Tiny+ASPP80.7%76.6%82.0%
CSPDarkNet-Tiny+ASPP++I3D(Impr oved 3D Inc)84.8%80.4%86.5%
Language: English
Page range: 48 - 58
Published on: Dec 31, 2024
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2024 Yingying Long, Zongxin Wang, Hanzhu Wei, Xiaojun Bai, published by Xi’an Technological University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.