Skip to main content
Have a personal or library account? Click to login
Deep Learning Sequence Network for Identifying and Analyzing Archery Shooting Patterns Cover

Deep Learning Sequence Network for Identifying and Analyzing Archery Shooting Patterns

By: Jihoon Park and  Hyongjun Choi  
Open Access
|May 2026

Full Article

Introduction

Archery is a highly precise sport where accuracy and consistency determine the outcome, which is critically influenced by athletes' technical skills and physical control. Each archer's shooting motion has unique characteristics, which are closely related to their performance level (Hemara et al., 2021). However, traditional coaching methods largely rely on coaches' experience and intuition, limiting objective and quantitative analysis. Therefore, it is necessary to develop an objective method capable of accurately identifying and analyzing each athlete's shooting motion (Brian, 2009; Zhang & Lin, 2024).

Following the rapid advancement of computer vision technology, new possibilities are emerging in sports for precisely analyzing athletes' movements. Particularly, 3D coordinates of body joints obtained through computer vision can be helpful in quantifying and standardizing athletes' movements (Chung et al., 2022; Ino et al., 2024; Kitamura et al., 2022; Wu et al., 2021).

Applying this technology to archery training allows coaches and athletes to move beyond subjective evaluations and obtain more objective and accurate feedback (Ji et al., 2025). Ultimately, this can lead to improved techniques and enhanced performance for athletes. Motion analysis using computer vision not only enables the development of personalized training programs tailored to individual athletes, but also helps detect and prevent potential injury risks beforehand (Pham et al., 2022).

Archers repeat the same motion for many hours each day. For world-class archers, motion accuracy is already well established. Therefore, the main goal of repetitive training is to maintain consistency, which is defined as the degree to which limb positions and trajectories remain similar across repeated shots, excluding the effects of psychological or environmental factors. Accordingly, it is crucial to objectively assess and improve the kinematic consistency of the motion. (Ji et al., 2024; S. Lee et al., 2024) To this end, computer vision technology may address this issue. This allows coaches to provide guidance based on objective data, ultimately contributing to the enhancement of archers' performance.

Previous studies on computer vision-based sports motion analysis examine areas such as tracking taekwondo athletes' positions during matches (dos Santos Banks et al., 2024), quantitatively analyzing basketball shooting, goalkeeper movements during penalty kicks, diving motions, and golf swings (Purnama et al., 2024; Liu et al., 2024; P. Ghadekar et al., 2024; Park, 2023), and classifying behaviors in volleyball and badminton through posture estimation (Purnama et al., 2024; Liu et al., 2024).

Existing motion recognition studies often require large amounts of labeled data and involve high costs. They also struggle to effectively utilize joint information or faced technical limitations in modeling joint features that represent the same motion (Cronin et al., 2024; Shah et al., 2022).

Deep learning-based methods have significantly improved performance in the field of human action recognition. However, the need for large-scale data remains a major challenge. Therefore, a new deep learning-based approach that reduces computational costs and performs well even with small datasets is needed. This study aims to develop a deep learning model that can accurately recognize and analyze archers' shooting motions by enhancing the use of joint information and improving data efficiency.

The goal is to develop a system that can precisely identify and analyze individual archers' shooting motions using deep learning-based computer vision and sequence networks. This system is expected to quantify each athlete's unique motion patterns and technical traits, enabling the design of customized training programs based on objective data. This study employs machine learning as a means to analyze and quantify the consistency and variability of archery shooting motions. By first demonstrating that the model can accurately identify individual archers solely from joint keypoint data, we establish that the system effectively learns distinctive motion patterns. This capability enables objective measurement of deviations from an athlete's own baseline, allowing for visualized feedback during training.

Methods
2.1
Data Collection

This study developed a method to evaluate archery shooting patterns and provide feedback by using a deep learning sequence network on landmark joint coordinates tracked via pose estimation. The data used in this study were collected over three months, from February to May 2024, by selecting four national-level male archers in Korea and recording their training sessions. The dataset encompasses full-body joint point coordinates extracted from full HD resolution (1920×1080 pixels), 60 fps videos, where the athlete's entire body is fully captured within the frame. [Each dataset includes the full sequence from the athlete's ready posture through drawing, holding, aiming, and releasing, and was labeled by athlete. Figure 1 illustrates an example of the dataset extraction process used in this study; only shooting samples under 900 frames per shot (less than 15 seconds) were included. The original shooting dataset contained 6,231 samples; however, after filtering, 5,984 samples were used in the final dataset. The original data and data availability used in this study are provided at the URL. (https://github.com/analysispark/deep-archery-sequence-data.git). All participants were fully informed of the study's purpose and procedures prior to data collection, and each provided written consent to participate.

Figure 1.

Environment Experimental Setup and Recording Environment Used for Video Analysis

2.2
Pose Estimation for Analyzing Archery Shooting Motions

This study employs the real-time multi-person one-stage (RTMO) framework to analyze the shooting motions of archers. RTMO is a state-of-the-art framework for real-time multi-person pose estimation. It is based on the you only look once (YOLO) architecture and can estimate poses directly without the need for a separate person detector. It achieves an average precision (AP) 1.1% higher than existing one-stage pose estimators on the common objects in context (COCO) dataset. It reaches 74.8% AP on COCO va12017, demonstrating excellent real-time performance with a processing speed of 141 FPS on a single V100 GPU (Chen et al., 2019; Lu et al., 2024).

The joint points extracted from the RTMO framework follow the COCO dataset's keypoint definitions and provide 17 joint points. The list and body parts corresponding to these joint points are shown below. (Table 1, Figure 2)

Table 1.

RTMO Joint Extraction Key Points

IndexJointIndexJoint
1Nose10Left wrist
2Left eye11Right wrist
3Right eye12Left hip
4Left ear13Right hip
5Right ear14Left knee
6Left shoulder15Right knee
7Right shoulder16Left ankle
8Left elbow17Right ankle
9Right elbow
Figure 2.

Example of Joint Extraction in Archery Shooting

Figure 3.

Shooting Process from Actual Video Clips Used

2.3
Preprocessing Procedure

To analyze only actual shooting motions, this study trimmed the original footage to exclude unnecessary segments before and after the shooting motion, using only clips from the ready position to release. Joint points were extracted frame by frame in JSON format. Each JSON file for a shooting instance contains metadata, including athlete information, shooting date and time, and an array of joint point data, with the structure [n][17][3]. Here, n denotes the number of frames. Moreover, the 3D array of 17 joint points includes x-coordinate, y-coordinate, and confidence score.

To suit the sequence model, the keypoint coordinates from each frame were normalized using MinMaxScaler from the scikit-learn library. This normalization compensates for differences in camera angles across recording dates and variations in distance between the athlete and the camera during each shoot, and helps prevent performance degradation in machine learning models due to scale differences in the data (Deepa & Ramesh, 2022). Additionally, for data with low confidence scores, interpolation using coordinates from −5 and +5 frames was applied to compute average values as a preprocessing method. Furthermore, 11 keypoints (nose, left and right eyes, ears, hips, knees, ankles) were excluded to reduce training cost and enable real-time processing, as these points remain fixed during the shooting motion (Ji et al., 2024; Lee et al., 2024; Zhang & Lin, 2024).

2.4
Archery Shooting Pattern Model Design

To analyze shooting motions and estimate individual patterns of archers, this study designed deep learning sequence models. Specifically, unidirectional sequence (RNN, LSTM, GRU) and bidirectional models (Bi-LSTM, Bi-GRU) were designed and compared. At this stage, the model was designed to determine the distinctive shooting motion patterns for each archer, forming the basis for subsequent analysis of motion consistency and variability. All models included a Global Average Pooling 1D layer to reduce the time dimension and extract average features from the entire sequence. Hyperparameter tuning of the neural networks was performed using the Optuna library (Akiba et al., 2019) to identify optimal parameters; Table 2 presents the specific model structures.

Table 2.

Architecture of Sequence Models for Archery Motion Analysis.

LayerAttributeDescription
Input Layer 1Unit128
Return sequencesTrue
ActivationTanh
Input shape(None, 900, 12)
DropoutRate0.3
Input Layer 2Unit32
Return sequencesTrue
ActivationTanh
DropoutRate0.3
Pooling LayerGlobal Average Pooling1D
Dense Layer 1Unit16
ActivationReLU
Dense Layer 2Unity_train.shape[1]
ActivationSoftmax

Model training used RMSE as the loss function and Adam as the optimization algorithm. The learning rate was set to 0.001, and the loss function used was categorical crossentropy. Considering the video features at 60 fps, the batch size was set to 64. Dropout layers were applied, randomly deactivating 30% of neurons to prevent overfitting. Additionally, early stopping was implemented to terminate training early if the validation loss did not improve for 10 consecutive epochs.

Results

The accuracy of each sequence model in estimating archers shooting motion patterns using only joint point positions via pose estimation was evaluated, and visualization results applying the proposed method for providing feedback on motion consistency included A total of 5,984 clips, each containing fewer than 900 frames, covering the full shooting process—from setup posture to pull-up, anchoring, aiming, and release—were used in the experiment. The experiment was conducted using a program implemented with Python 3.10.12 on a system with an Intel Core i9-14900K processor, 64GB RAM, NVIDIA Geforce RTX-4080 SUPER GPU, and Ubuntu 22.04 LTS OS. To ensure the reliability of the results, third-party verification was requested from a certification agency, and a test report was issued (Report Number: 2024-224-VSW-R).

3.1
Comparison of Shooting Motion Pattern Classification using Pose Estimation-based Deep Learning Sequence Algorithms

Deep learning sequence models were used to learn the shooting motion patterns of archers. The training results for each model are as follows.

The RNN model showed a training accuracy of 89.7%. The GRU algorithm achieved a training accuracy of 98.5%, while the bidirectional GRU (Bi-GRU) achieved 98.7%, showing the best performance. The LSTM model recorded the lowest performance at 55.9%, while the bidirectional LSTM (Bi-LSTM) achieved 96.6%. (Figure 4)

Figure 4.

Sequence Model Training Accuracy

Table 3 presents performance evaluation metrics for each model. Bi-GRU had the highest scores in precision, recall, and F1-score. Regarding LSTM, precision was higher than accuracy, but F1-score was lower. In contrast to the single LSTM, the Bi-LSTM model demonstrated excellent performance in the evaluation metric (accuracy).

Table 3.

Performance Evaluation Metrics of Sequence Models

ModelAccuracyPrecisionRecallF1-score
RNN89.790.189.789.8
GRU98.598.598.598.5
Bi-GRU98.798.798.798.7
LSTM55.960.555.949.3
Bi-LSTM96.696.696.696.6

The Bi-GRU model, which showed the best performance metrics, achieved a training accuracy of 99.9% and a training loss of 0.0013 over 64 epochs. The accuracy on the test dataset, which was not used during training, was 98.7%, with a loss of 0.0635.

3.2
Visualization of Athletes' Shooting Patterns

This section visually analyzes the predictions made by the pose estimation-based deep learning model to compare the shooting motion patterns of athletes. The results of visualizing athletes' shooting patterns are as follows.

Figure 5 visualizes joint point movements of each athlete across all shooting data, starting from the top-left. For example, the upper graph for athlete A shows changes in x-coordinates of joints during the shooting motion, while the lower graph shows changes in y-coordinates. The x-axis in each athlete's graph represents frames (time), and the y-axis shows normalized coordinate values. A narrower variance range indicates that the athlete performs consistent motions, and the figure clearly shows distinct differences in shooting patterns among the four athletes.

Figure 5.

Visualization of Athletes' Shooting Patterns.

Note: Normalized x- and y-coordinate trajectories for four athletes across complete shooting sequences. Narrow variance indicates higher motion consistency.

Discussion

This study classifies the shooting motions of archers using pose estimation technology and a deep learning sequence network-based model, offering the following implications and significance. Through in-depth discussion of the results and methodological choices, the study presents directions for future research and practical applications.

4.1
Model Performance and Methodological Validity

Machine learning enables automated, large-scale, and quantitative analysis of motion consistency and variability, which would be challenging to achieve solely through manual pose estimation methods. Recent advancements in AI technologies have brought revolutionary changes to sports science, enabling quantitative analysis of motion and the development of personalized training programs. In sports like archery, which demand high precision, deep learning-based sequence models have shown excellent capability in capturing kinematic chain reactions. Archery involves shooting arrows at a target from a fixed distance, with scores assigned based on arrow accuracy. The shooting motion in archery can be divided into six stages: (1) grip, (2) draw, (3) hold, (4) aim, (5) release, and (6) full draw (World Archery, 2020). Each stage is closely linked to the previous one, and the quality of the preceding motion affects the next, making it essential to maintain proper form at each step (Vendrame et al., 2024).

Owing to these characteristics, archery shooting motions can be considered sequential data over time, indicating the methodological validity of using sequence models. Sequence models can effectively model temporal dependencies and quantitatively analyze how earlier motions affect later stages. Particularly, RNN-based models like GRU perform well in capturing long-term dependencies, making them suitable for analyzing continuous and complex motion patterns such as archery (Gavilanes, 2023; Purnama et al., 2024; Sunal et al., 2021).

Experimental results showed that the Bi-GRU model achieved the highest performance with 98.7% accuracy. Figure 6 shows the confusion matrix for classifying athletes' shooting patterns, confirming the model's robustness for real-world application. In contrast, the low performance of LSTM (55.9%) is presumed to stem from overfitting or insufficient hyperparameter tuning, while the high accuracy of Bi-LSTM (96.6%) suggests that the bidirectional structure partially overcame these limitations (Yang et al., 2020).

Figure 6.

Confusion Matrix of the Bi-GRU Model.

Note: X-axis = predicted classes (A D), Y-axis = actual classes. Overall accuracy reached 98%.

4.2
Practical Applicability and Contribution to Sports Science

The proposed system enables coaches to provide more concrete and practical feedback based on motion analysis results, significantly contributing to athletes' technical improvement. By visualizing the variance in each athlete's shooting trajectory, the system quantifies subtle patterns that are not visually detectable, identifying points of motion inconsistency. This allows coaches to guide athletes toward consistent motion, which is critical in archery. Figure 5 visualizes joint point movements of each athlete, clearly showing distinct differences in shooting patterns. Figure 7 further illustrates three shooting instances of Athlete A, enabling precise feedback for consistent motion. When visualized on actual shooting videos, the error rates of joint points can be presented as shown in Figure 8, allowing athletes to analyze their motion more effectively while watching their practice footage.

Figure 7.

Joint Variation Graphs for Three Shooting Instances of Athlete A.

Note: Left panel shows x-coordinate trajectories; right panel shows y-coordinate trajectories for key joints across three shooting instances.

Figure 8.

Field Application Example of Feedback for Consistent Shooting Motion

Note: Error rates for each joint are overlaid on actual shooting video frames. Joints exceeding a standard error threshold of 0.4 are highlighted for corrective training.

Through such applications, the deep learning sequence network-based archery shooting motion analysis system developed in this study is expected to significantly contribute to technical improvement and performance enhancement among archers through consistent motion training.

Conclusion

This study developed a system capable of objectively identifying and analyzing individual archers’ shooting motions using a deep learning-based sequence network. To achieve this, a bidirectional GRU network model was proposed, effectively integrating joint and temporal information, and was trained and evaluated using real athlete data.

The proposed model effectively captured temporal dependencies in motion and achieved an accuracy of 98%. Particularly, the bidirectional GRU model demonstrated superior performance. This enabled the quantification of each athlete's unique motion patterns and technical characteristics, making it possible to design customized training programs based on objective data.

Furthermore, coaches were able to provide more specific and practical feedback based on motion analysis results, significantly aiding athletes’ technical improvement. The approach proposed in this study can be extended beyond archery to other precision sports, contributing to advancements in sports science.

The deep learning sequence network-based shooting motion analysis system developed in this study is expected to significantly enhance technical skills and performance in archers. However, further research is necessary to advance the technology in the following directions.

First, new network architectures should be explored to more effectively integrate joint and temporal information. For example, attention-based modules that directly incorporate temporal information into joint data, or fusion modules that combine both, could be considered.

Second, expanding and augmenting the dataset is necessary to improve the model's generalization capability. The current dataset lacks diversity in certain motion patterns. Therefore, a dataset including a wider range of athletes’ shooting styles or applying data augmentation techniques is needed.

Third, adding real-time feedback and injury prevention features would increase the system's practical utility. Functions such as real-time motion analysis with error pattern detection and early warning of potential injury risks could be implemented.

This study marks the first attempt to classify archery motion patterns by integrating pose estimation and sequence network models, significantly improving practicality and accuracy compared with previous research. Future multidisciplinary approaches are expected to maximize synergies between sports science and AI.

Language: English
Page range: 131 - 142
Published on: May 3, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2026 Jihoon Park, Hyongjun Choi, published by International Association of Computer Science in Sport
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.