Figure 1.

Figure 2.

Figure 3.

Figure 4.

Detection results of 3D-CBAM attention model embedded at different locations
| Network | Embedding position | UCF101-24 | JHMDB | VioData |
|---|---|---|---|---|
| mAP | ||||
| - | 84.4% | 80.4% | 86.5% | |
| 3D Inc_1 | 86.1% | 83.7% | 89.0% | |
| 3D Inc_2 | 86.7% | 83.3% | 88.3% | |
| 3D Inc_3 | 85.9% | 84.2% | 89.6% | |
| 3D Inc_1+3D Inc_2 | 88.2% | 87.5% | 90.7% | |
| I3D | 3D Inc_1+3D Inc_3 | 89.8% | 88.6% | 91.8% |
| 3D Inc_2+3D Inc_3 | 88.0% | 88.0% | 91.4% | |
| 3D Inc_1+3D Inc_2+3D Inc_3 | 90.0% | 88.7% | 92.0% | |
Parameter settings in network training
| Parameter | Setting |
|---|---|
| Initial Learning Rate | 0.001 |
| Epoch | 230 |
| ReSize | (416,416) |
| ReSize | (416,416) |
| Weight Decay | 0.0005 |
| Optimizer | Adam |
Results of violence detection accuracy of different models
| Method | UCF101-24 | JHMDB | VioData |
|---|---|---|---|
| mAP | |||
| MPS | 82.4% | - | 85.3 |
| P3D-CTN | - | 84.0% | 84.9% |
| STEP | 83.1% | - | 86.4% |
| YOWO | 82.5% | 85.7% | 88.0% |
| ours | 89.8% | 88.6% | 91.8% |
Detection results with embedded ASPP model and introduction of spatio-temporal depth separable convolution
| Network | UCF101-24 | JHMDB | VioData |
|---|---|---|---|
| mAP | |||
| Baseline | 78.5% | 75.3% | 78.9% |
| CSPDarkNet-Tiny+ASPP | 80.7% | 76.6% | 82.0% |
| CSPDarkNet-Tiny+ASPP++I3D(Impr oved 3D Inc) | 84.8% | 80.4% | 86.5% |