
Figure 1
REMUS-100 sidescan sonar survey. Aircraft wreck debris labeled as feature class in ArcGIS Pro.
Table 1
Training and validation data. Data includes 600 and 1200 kHz data for each aircraft.
| DATA LOCATION | NUMBER AIRCRAFT | NUMBER ACW LABELED |
|---|---|---|
| Chuuk, Micronesia | 9 | 158 |
| Croatia | 6 | 128 |
| Maloelap, Marshall Islands | 1 | 1 |
| Denmark | 1 | 1 |
| Alaska | 1 | 1 |
| Palau | 1 | 1 |
| Total | 19 | 290 |

Figure 2
REMUS-100 sidescan sonar validation data, including heavily disarticulated ACW from Croatia and more intact, but still not easily recognizable as aircraft, ACW from Maloelap. For the Croatia ACW, the model’s validation score was based not just on the two larger pieces of debris, but also on all the small pieces composing the debris field. Odd- and even-numbered mosaic views are provided for Maloelap because the views look very different. Croatia views look more similar, and so only one is provided here.

Figure 3
Four batches composed of 16 images each.
Table 2
Formulas for calculating accuracy metrics used in this study.
| Recall | TP/(TP+FN) |
| Precision | TP/(TP+FP) |
| F1 | (2*Recall*Precision)/(Recall+Precision) |
Table 3
Some of the tested model configurations used to determine the optimal model. All of these models had a batch size of 16, an image size of 640 × 640 pixels, and a spatial resolution of 10 cm. Train and Val refer to the number of training and validation samples, respectively. TNs is the number of true negatives. F1 is the F1 accuracy score. The best performing model is gray-highlighted in each table, and is based on the F1 score.
| TRAIN | VAL | EPOCHS | TNs | F1 |
|---|---|---|---|---|
| YOLOv7 | ||||
| 509 | 74 | 50 | 0 | .57 |
| 674 | 52 | 50 | 0 | .66 |
| 810 | 52 | 50 | 0 | .68 |
| 810 | 52 | 25 | 10 | .73 |
| 810 | 52 | 25 | 0 | .74 |
| YOLOv8 | ||||
| 810 | 52 | 25 | 128 | .65 |
| 810 | 52 | 25 | 10 | .70 |
| 810 | 52 | 25 | 0 | .72 |
Table 4
Accuracy metrics for the validation dataset. Total score is out of 1.
| METRIC | YOLOv7 SCORE | YOLOv8x SCORE |
|---|---|---|
| Recall | .75 | .68 |
| Precision | .73 | .77 |
| F1 | .74 | .72 |

Figure 4
F1 curve for highest performing model: YOLOv7.
Table 5
Model parameters and hyperparameters for highest performing YOLOv7 and YOLOv8 models (same parameters for both).
| Training dataset size | 810 image tiles |
| Validation dataset size | 52 image tiles |
| True Negative tiles for training/validation | 0 image tiles |
| Tile pixels | 640 × 640 pixels |
| Batch size | 16 |
| Epochs | 25 |

Figure 5
REMUS-100 sidescan sonar mosaics. ACW in newly collected data from the 2023 field season in Micronesia is shown inside of white boxes. ACW in images a, b, and c was detected by the model, while ACW in image d was not.

Figure 6
REMUS-100 sidescan sonar data. ACW from Croatia used in minimum spatial resolution assessment.

Figure 7
REMUS-100 sidescan sonar data. Two different views of ACW from Maloelap, Marshall Islands used in minimum spatial resolution assessment.
Table 6
Overview of results of the minimum spatial resolution assessment. For the Maloelap table, A. and B. correspond to the images of the two different mosaics shown in Figure 2.
| KOMIŽA, CROATIA | ||||||||
|---|---|---|---|---|---|---|---|---|
| 10 cm | 50 cm | 1 m | 3 m | |||||
| TP predicted | 26 | 1 | 0 | 0 | ||||
| FP predicted | 8 | 1 | 0 | 0 | ||||
| FN predicted | 6 | 0 | 0 | 0 | ||||
| TN predicted | 395 | 18 | 0 | 0 | ||||
| Total number image tiles | 435 | 20 | 8 | 2 | ||||
| Total bounding boxes labeled by human | 20 | 20 | 20 | 20 | ||||
| MALOELAP, MARSHALL ISLANDS | ||||||||
| A. 10 cm | B. 10 cm | A. 50 cm | B. 50 cm | A. 1 m | B. 1 m | A. 3 m | B. 3 m | |
| TP predicted | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| FP predicted | 15 | 4 | 2 | 1 | 0 | 0 | 0 | 0 |
| FN predicted | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| TN predicted | 806 | 799 | 36 | 37 | 0 | 0 | 0 | 0 |
| Total number image tiles | 822 | 804 | 39 | 39 | 10 | 10 | 2 | 2 |
| Total bounding boxes labeled by human | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

Figure 8
Line graph used to address overfitting issue for final, highest performing model.

Figure 9
Upper row shows a sample of aircraft from the mostly synthetically-generated Seabed Objects KLSG dataset and lower row shows sample of aircraft from the dataset presented in this paper. The two datasets look very different. In particular, much of our dataset consists of small, heavily fragmented ACW as shown in the last two images of the bottom row.

Figure 10
a. Side-by-side comparison of sidescan sonar and the corresponding magnetometer data. b. The two overlaid using GIS.
