Deep Learning Sequence Network for Identifying and Analyzing Archery Shooting Patterns

Jihoon Park; Hyongjun Choi

doi:10.2478/ijcss-2025-0016

Introduction

Archery is a highly precise sport where accuracy and consistency determine the outcome, which is critically influenced by athletes' technical skills and physical control. Each archer's shooting motion has unique characteristics, which are closely related to their performance level (Hemara et al., 2021). However, traditional coaching methods largely rely on coaches' experience and intuition, limiting objective and quantitative analysis. Therefore, it is necessary to develop an objective method capable of accurately identifying and analyzing each athlete's shooting motion (Brian, 2009; Zhang & Lin, 2024).

Following the rapid advancement of computer vision technology, new possibilities are emerging in sports for precisely analyzing athletes' movements. Particularly, 3D coordinates of body joints obtained through computer vision can be helpful in quantifying and standardizing athletes' movements (Chung et al., 2022; Ino et al., 2024; Kitamura et al., 2022; Wu et al., 2021).

Applying this technology to archery training allows coaches and athletes to move beyond subjective evaluations and obtain more objective and accurate feedback (Ji et al., 2025). Ultimately, this can lead to improved techniques and enhanced performance for athletes. Motion analysis using computer vision not only enables the development of personalized training programs tailored to individual athletes, but also helps detect and prevent potential injury risks beforehand (Pham et al., 2022).

Archers repeat the same motion for many hours each day. For world-class archers, motion accuracy is already well established. Therefore, the main goal of repetitive training is to maintain consistency, which is defined as the degree to which limb positions and trajectories remain similar across repeated shots, excluding the effects of psychological or environmental factors. Accordingly, it is crucial to objectively assess and improve the kinematic consistency of the motion. (Ji et al., 2024; S. Lee et al., 2024) To this end, computer vision technology may address this issue. This allows coaches to provide guidance based on objective data, ultimately contributing to the enhancement of archers' performance.

Previous studies on computer vision-based sports motion analysis examine areas such as tracking taekwondo athletes' positions during matches (dos Santos Banks et al., 2024), quantitatively analyzing basketball shooting, goalkeeper movements during penalty kicks, diving motions, and golf swings (Purnama et al., 2024; Liu et al., 2024; P. Ghadekar et al., 2024; Park, 2023), and classifying behaviors in volleyball and badminton through posture estimation (Purnama et al., 2024; Liu et al., 2024).

Existing motion recognition studies often require large amounts of labeled data and involve high costs. They also struggle to effectively utilize joint information or faced technical limitations in modeling joint features that represent the same motion (Cronin et al., 2024; Shah et al., 2022).

Deep learning-based methods have significantly improved performance in the field of human action recognition. However, the need for large-scale data remains a major challenge. Therefore, a new deep learning-based approach that reduces computational costs and performs well even with small datasets is needed. This study aims to develop a deep learning model that can accurately recognize and analyze archers' shooting motions by enhancing the use of joint information and improving data efficiency.

The goal is to develop a system that can precisely identify and analyze individual archers' shooting motions using deep learning-based computer vision and sequence networks. This system is expected to quantify each athlete's unique motion patterns and technical traits, enabling the design of customized training programs based on objective data. This study employs machine learning as a means to analyze and quantify the consistency and variability of archery shooting motions. By first demonstrating that the model can accurately identify individual archers solely from joint keypoint data, we establish that the system effectively learns distinctive motion patterns. This capability enables objective measurement of deviations from an athlete's own baseline, allowing for visualized feedback during training.

Methods

2.1

Data Collection

This study developed a method to evaluate archery shooting patterns and provide feedback by using a deep learning sequence network on landmark joint coordinates tracked via pose estimation. The data used in this study were collected over three months, from February to May 2024, by selecting four national-level male archers in Korea and recording their training sessions. The dataset encompasses full-body joint point coordinates extracted from full HD resolution (1920×1080 pixels), 60 fps videos, where the athlete's entire body is fully captured within the frame. [Each dataset includes the full sequence from the athlete's ready posture through drawing, holding, aiming, and releasing, and was labeled by athlete. Figure 1 illustrates an example of the dataset extraction process used in this study; only shooting samples under 900 frames per shot (less than 15 seconds) were included. The original shooting dataset contained 6,231 samples; however, after filtering, 5,984 samples were used in the final dataset. The original data and data availability used in this study are provided at the URL. (https://github.com/analysispark/deep-archery-sequence-data.git). All participants were fully informed of the study's purpose and procedures prior to data collection, and each provided written consent to participate.

2.2

Pose Estimation for Analyzing Archery Shooting Motions

This study employs the real-time multi-person one-stage (RTMO) framework to analyze the shooting motions of archers. RTMO is a state-of-the-art framework for real-time multi-person pose estimation. It is based on the you only look once (YOLO) architecture and can estimate poses directly without the need for a separate person detector. It achieves an average precision (AP) 1.1% higher than existing one-stage pose estimators on the common objects in context (COCO) dataset. It reaches 74.8% AP on COCO va12017, demonstrating excellent real-time performance with a processing speed of 141 FPS on a single V100 GPU (Chen et al., 2019; Lu et al., 2024).

The joint points extracted from the RTMO framework follow the COCO dataset's keypoint definitions and provide 17 joint points. The list and body parts corresponding to these joint points are shown below. (Table 1, Figure 2)

Table 1.

RTMO Joint Extraction Key Points

Index	Joint	Index	Joint
1	Nose	10	Left wrist
2	Left eye	11	Right wrist
3	Right eye	12	Left hip
4	Left ear	13	Right hip
5	Right ear	14	Left knee
6	Left shoulder	15	Right knee
7	Right shoulder	16	Left ankle
8	Left elbow	17	Right ankle
9	Right elbow

2.3

Preprocessing Procedure

To analyze only actual shooting motions, this study trimmed the original footage to exclude unnecessary segments before and after the shooting motion, using only clips from the ready position to release. Joint points were extracted frame by frame in JSON format. Each JSON file for a shooting instance contains metadata, including athlete information, shooting date and time, and an array of joint point data, with the structure [n][17][3]. Here, n denotes the number of frames. Moreover, the 3D array of 17 joint points includes x-coordinate, y-coordinate, and confidence score.

To suit the sequence model, the keypoint coordinates from each frame were normalized using MinMaxScaler from the scikit-learn library. This normalization compensates for differences in camera angles across recording dates and variations in distance between the athlete and the camera during each shoot, and helps prevent performance degradation in machine learning models due to scale differences in the data (Deepa & Ramesh, 2022). Additionally, for data with low confidence scores, interpolation using coordinates from −5 and +5 frames was applied to compute average values as a preprocessing method. Furthermore, 11 keypoints (nose, left and right eyes, ears, hips, knees, ankles) were excluded to reduce training cost and enable real-time processing, as these points remain fixed during the shooting motion (Ji et al., 2024; Lee et al., 2024; Zhang & Lin, 2024).

2.4

Archery Shooting Pattern Model Design

To analyze shooting motions and estimate individual patterns of archers, this study designed deep learning sequence models. Specifically, unidirectional sequence (RNN, LSTM, GRU) and bidirectional models (Bi-LSTM, Bi-GRU) were designed and compared. At this stage, the model was designed to determine the distinctive shooting motion patterns for each archer, forming the basis for subsequent analysis of motion consistency and variability. All models included a Global Average Pooling 1D layer to reduce the time dimension and extract average features from the entire sequence. Hyperparameter tuning of the neural networks was performed using the Optuna library (Akiba et al., 2019) to identify optimal parameters; Table 2 presents the specific model structures.

Table 2.

Architecture of Sequence Models for Archery Motion Analysis.

Layer	Attribute	Description
Input Layer 1	Unit	128
	Return sequences	True
	Activation	Tanh
	Input shape	(None, 900, 12)
Dropout	Rate	0.3
Input Layer 2	Unit	32
	Return sequences	True
	Activation	Tanh
Dropout	Rate	0.3
Pooling Layer	Global Average Pooling	1D
Dense Layer 1	Unit	16
Dense Layer 1	Activation	ReLU
Dense Layer 2	Unit	y_train.shape[1]
Dense Layer 2	Activation	Softmax

Model training used RMSE as the loss function and Adam as the optimization algorithm. The learning rate was set to 0.001, and the loss function used was categorical crossentropy. Considering the video features at 60 fps, the batch size was set to 64. Dropout layers were applied, randomly deactivating 30% of neurons to prevent overfitting. Additionally, early stopping was implemented to terminate training early if the validation loss did not improve for 10 consecutive epochs.

Results

The accuracy of each sequence model in estimating archers shooting motion patterns using only joint point positions via pose estimation was evaluated, and visualization results applying the proposed method for providing feedback on motion consistency included A total of 5,984 clips, each containing fewer than 900 frames, covering the full shooting process—from setup posture to pull-up, anchoring, aiming, and release—were used in the experiment. The experiment was conducted using a program implemented with Python 3.10.12 on a system with an Intel Core i9-14900K processor, 64GB RAM, NVIDIA Geforce RTX-4080 SUPER GPU, and Ubuntu 22.04 LTS OS. To ensure the reliability of the results, third-party verification was requested from a certification agency, and a test report was issued (Report Number: 2024-224-VSW-R).

3.1

Comparison of Shooting Motion Pattern Classification using Pose Estimation-based Deep Learning Sequence Algorithms

Deep learning sequence models were used to learn the shooting motion patterns of archers. The training results for each model are as follows.

The RNN model showed a training accuracy of 89.7%. The GRU algorithm achieved a training accuracy of 98.5%, while the bidirectional GRU (Bi-GRU) achieved 98.7%, showing the best performance. The LSTM model recorded the lowest performance at 55.9%, while the bidirectional LSTM (Bi-LSTM) achieved 96.6%. (Figure 4)

Table 3 presents performance evaluation metrics for each model. Bi-GRU had the highest scores in precision, recall, and F1-score. Regarding LSTM, precision was higher than accuracy, but F1-score was lower. In contrast to the single LSTM, the Bi-LSTM model demonstrated excellent performance in the evaluation metric (accuracy).

Table 3.

Performance Evaluation Metrics of Sequence Models

Model	Accuracy	Precision	Recall	F1-score
RNN	89.7	90.1	89.7	89.8
GRU	98.5	98.5	98.5	98.5
Bi-GRU	98.7	98.7	98.7	98.7
LSTM	55.9	60.5	55.9	49.3
Bi-LSTM	96.6	96.6	96.6	96.6

The Bi-GRU model, which showed the best performance metrics, achieved a training accuracy of 99.9% and a training loss of 0.0013 over 64 epochs. The accuracy on the test dataset, which was not used during training, was 98.7%, with a loss of 0.0635.

3.2

Visualization of Athletes' Shooting Patterns

This section visually analyzes the predictions made by the pose estimation-based deep learning model to compare the shooting motion patterns of athletes. The results of visualizing athletes' shooting patterns are as follows.

Figure 5 visualizes joint point movements of each athlete across all shooting data, starting from the top-left. For example, the upper graph for athlete A shows changes in x-coordinates of joints during the shooting motion, while the lower graph shows changes in y-coordinates. The x-axis in each athlete's graph represents frames (time), and the y-axis shows normalized coordinate values. A narrower variance range indicates that the athlete performs consistent motions, and the figure clearly shows distinct differences in shooting patterns among the four athletes.

Discussion

This study classifies the shooting motions of archers using pose estimation technology and a deep learning sequence network-based model, offering the following implications and significance. Through in-depth discussion of the results and methodological choices, the study presents directions for future research and practical applications.

4.1

Model Performance and Methodological Validity

Machine learning enables automated, large-scale, and quantitative analysis of motion consistency and variability, which would be challenging to achieve solely through manual pose estimation methods. Recent advancements in AI technologies have brought revolutionary changes to sports science, enabling quantitative analysis of motion and the development of personalized training programs. In sports like archery, which demand high precision, deep learning-based sequence models have shown excellent capability in capturing kinematic chain reactions. Archery involves shooting arrows at a target from a fixed distance, with scores assigned based on arrow accuracy. The shooting motion in archery can be divided into six stages: (1) grip, (2) draw, (3) hold, (4) aim, (5) release, and (6) full draw (World Archery, 2020). Each stage is closely linked to the previous one, and the quality of the preceding motion affects the next, making it essential to maintain proper form at each step (Vendrame et al., 2024).

Owing to these characteristics, archery shooting motions can be considered sequential data over time, indicating the methodological validity of using sequence models. Sequence models can effectively model temporal dependencies and quantitatively analyze how earlier motions affect later stages. Particularly, RNN-based models like GRU perform well in capturing long-term dependencies, making them suitable for analyzing continuous and complex motion patterns such as archery (Gavilanes, 2023; Purnama et al., 2024; Sunal et al., 2021).

Experimental results showed that the Bi-GRU model achieved the highest performance with 98.7% accuracy. Figure 6 shows the confusion matrix for classifying athletes' shooting patterns, confirming the model's robustness for real-world application. In contrast, the low performance of LSTM (55.9%) is presumed to stem from overfitting or insufficient hyperparameter tuning, while the high accuracy of Bi-LSTM (96.6%) suggests that the bidirectional structure partially overcame these limitations (Yang et al., 2020).

4.2

Practical Applicability and Contribution to Sports Science

The proposed system enables coaches to provide more concrete and practical feedback based on motion analysis results, significantly contributing to athletes' technical improvement. By visualizing the variance in each athlete's shooting trajectory, the system quantifies subtle patterns that are not visually detectable, identifying points of motion inconsistency. This allows coaches to guide athletes toward consistent motion, which is critical in archery. Figure 5 visualizes joint point movements of each athlete, clearly showing distinct differences in shooting patterns. Figure 7 further illustrates three shooting instances of Athlete A, enabling precise feedback for consistent motion. When visualized on actual shooting videos, the error rates of joint points can be presented as shown in Figure 8, allowing athletes to analyze their motion more effectively while watching their practice footage.

Through such applications, the deep learning sequence network-based archery shooting motion analysis system developed in this study is expected to significantly contribute to technical improvement and performance enhancement among archers through consistent motion training.

Conclusion

This study developed a system capable of objectively identifying and analyzing individual archers’ shooting motions using a deep learning-based sequence network. To achieve this, a bidirectional GRU network model was proposed, effectively integrating joint and temporal information, and was trained and evaluated using real athlete data.

The proposed model effectively captured temporal dependencies in motion and achieved an accuracy of 98%. Particularly, the bidirectional GRU model demonstrated superior performance. This enabled the quantification of each athlete's unique motion patterns and technical characteristics, making it possible to design customized training programs based on objective data.

Furthermore, coaches were able to provide more specific and practical feedback based on motion analysis results, significantly aiding athletes’ technical improvement. The approach proposed in this study can be extended beyond archery to other precision sports, contributing to advancements in sports science.

The deep learning sequence network-based shooting motion analysis system developed in this study is expected to significantly enhance technical skills and performance in archers. However, further research is necessary to advance the technology in the following directions.

First, new network architectures should be explored to more effectively integrate joint and temporal information. For example, attention-based modules that directly incorporate temporal information into joint data, or fusion modules that combine both, could be considered.

Second, expanding and augmenting the dataset is necessary to improve the model's generalization capability. The current dataset lacks diversity in certain motion patterns. Therefore, a dataset including a wider range of athletes’ shooting styles or applying data augmentation techniques is needed.

Third, adding real-time feedback and injury prevention features would increase the system's practical utility. Functions such as real-time motion analysis with error pattern detection and early warning of potential injury risks could be implemented.

This study marks the first attempt to classify archery motion patterns by integrating pose estimation and sequence network models, significantly improving practicality and accuracy compared with previous research. Future multidisciplinary approaches are expected to maximize synergies between sports science and AI.

Deep Learning Sequence Network for Identifying and Analyzing Archery Shooting Patterns

Full Article

Paradigm

My account