
Figure 1
Evaluation of Snapshot Safari classifier performance on images in the test dataset for every possible species class from the nine sites that provided training data. The model was evaluated for accuracy across all training images and separately for images classified with high confidence scores (>95%).

Figure 2
Exemplar images, descriptions, and comparison tools that are available to volunteers on the Snapshot Safari survey menu. Selecting Wildebeest in the species menu brings up the exemplar images and description of the species. If the volunteer is unsure about a classification, they can click on the species in the “Sometimes confused with” category, which pops up that species’ images and description for comparison. Volunteers can also use field guides, tutorials, and filters to assign a label. They are then asked to count the individuals and annotate behaviors and basic demographics, including counting horns in dimorphic species.

Figure 3
Differences in species-specific error rates between AI (black circles) and HITL (orange circles) classification methods. The blue bar represents the size of the error reduction from adding HITL to supervise AI classifications of unseen CT images. In all except two cases (aardwolf and Thomson’s gazelle), HITL decreased the error rate. Species are arrayed from most to least common within the training dataset with the number of training images in parentheses.

Figure 4
Comparison images of species commonly confused by AI (a) and HITL (b, c). a) AI consistently labeled pictures of caracals (left) as lions (right). b) HITL frequently labeled images of aardwolves (left) as striped hyenas (right). c) HITL reduced error rates for impala (left) and Grant’s gazelle (center) but performed worse than AI on Thomson’s gazelle (right).
