Introduction
Citizen (or community) science holds immense potential as a dynamic, collaborative approach to modern research (Bonney et al. 2014). By mobilizing volunteers, citizen science enables data collection on an unprecedented scale, addressing complex issues such as climate change, biodiversity conservation, and public health (Haklay 2013; Vohland et al. 2021). It promotes public understanding of science and encourages active participation in research and building scientific literacy (Bonney et al. 2016). Arguably, citizen science enhances the democratization of knowledge, making science more inclusive to people (Heigl et al. 2019), which promotes social responsibility, encouraging sustainable practices and informed decision-making (Vohland et al. 2021). Citizen science encompasses diverse levels of participation (Haklay 2013; Land-Zandstra, Agnello and Gültekin 2021), each offering a different perspective on the role of the public in scientific research. Varying participation (e.g., crowdsourcing, distributed intelligence, co-created and extreme citizen science, listed according to increasing participatory involvement), offers great flexibility to involve a range of communities in scientific research, bridging the gap between professional researchers and the people while advancing scientific knowledge.
Despite its significant potential, citizen science has certain challenges. For example, varied project aims and differences with respect to the field of research may challenge how useful citizen science projects can be (Heigl et al. 2019). The issues include ensuring data quality and reliability, coordinating research between professionals and participants, communicating effectively, and maintaining participant motivation (Frigerio et al. 2021; Pelacho et al. 2021; Tiago et al. 2017b). Also, dispute over how inclusive participatory citizen science should be may depend on the field of research and the working culture (Haklay et al. 2021). Despite these challenges, citizen science presents compelling opportunities. It can generate vast data volumes from diverse geographic regions, enabling large-scale research such as environmental monitoring (Frigerio et al. 2021; Lehikoinen et al. 2023; Tiago et al. 2017a). Additionally, citizen science can be a very cost-effective research method, asking questions and executing sampling efforts that are beyond classic research (Di Cecco et al. 2021; Stuber et al. 2022; Thompson et al. 2023). To realize its full potential, addressing challenges through clear project design, effective communication, and research support (e.g., technological infrastructure) is crucial, maximizing its contributions to both research and public engagement (Rüfenacht et al. 2021).
Modern mobile phones make it feasible to obtain hi-tech information collected by people on site and link such data to occurrence events, which unlocks more detailed research questions being asked. Given that mobile application–based citizen science products are particularly fruitful for supporting ecological research, we outline some examples with a specific emphasis on automated bird sound classifiers. These systems typically present a similar workflow: 1) users input data by recording sounds with the application or using pre-recorded audio files; 2) AI analyses the recordings; and 3) the application returns the most likely identification, along with information about the species. The bird sound classifiers can be invaluable as they engage people in citizen science, provide applications on systematic wildlife monitoring, and can be used as an educational resource.
There are many applications that classify birds from sounds (e.g., BirdNET, Merlin Bird ID, Song Sleuth, Bird Genie ID, and many more). For example, BirdNET and Merlin Bird ID are free and available for different operating systems through the application store (e.g., Google Play for Android or App Store for iOS), making them accessible to a wide audience. Current identification systems are typically based on AI-methods, more specifically neural networks (Kahl et al. 2021; Lauha et al. 2022; Xie et al. 2023). BirdNET (by Cornell Lab of Ornithology and Chemnitz University of Technology) (Kahl et al. 2021) covers around 3,000 of the world’s most common species, which makes it a great tool exclusively on sound identification. Merlin Bird ID covers 1,382 species of birds, offers both sound and photo identification, and integrates with eBird for birding data that allows users to record and share their observations. The user can see a sonogram on the screen from which the app prompts bird species identifications as the audio stream gets automatically analysed in real time. Song Sleuth (by Wildlife Acoustics) includes only 203 species of North American birds but is more focused on educational content. BirdGenie ID (by Princeton University Press) is restricted to North American species and the Android platform. In comparison, eBird (managed by the Cornell Lab of Ornithology) does not directly identify the bird sounds (it integrates with Merlin Bird ID for this function) but offers, in turn, a rich database of observations and distribution maps (Sullivan et al. 2014). There are also several other citizen science platforms that collect species observations of birds and other taxa both globally (e.g., iNaturalist) and nationally (e.g., tiira.fi and laji.fi in Finland). Ultimately, the choice between the applications depends on needs, such as the educational resources provided, species coverage, or integration with other platforms.
Here, we showcase a new bird sound classification product (in Finnish: Muuttolintujen kevät; In English: Spring of Migratory Birds), an application for enhancing digital citizen science data collection on birds in Finland. In contrast to other popular identification apps, this app not only provides information on time, place, and probability of species identification but also stores the raw audio files in a centralized repository. This enables rigorous validation and re-analysis of the data for future scientific use. To reach maximum visibility and public interest on birds as bioindicators, the application was launched in collaboration with the Finnish broadcasting company (YLE). With the application, any user or citizen scientist can record bird vocalizations with a mobile phone and ask the AI-powered machine learning algorithm to classify bird identity. The application returns an answer with probability estimates of the species occurring in the recording (see Methods: Mobile phone application). Users were also allowed to validate the identified observations (i.e., true or false), thus providing information with which the algorithm is trained further.
We set out to address the following questions (Table 1): First, without a specific hypothesis, we report descriptive information on how citizen scientists welcomed the application as indicated by the number of users and observations. Then, we asked how spatial and temporal data of observations were distributed over a single season in Finland. We hypothesized that, spatially, observations would be more aggregated in urban rather than rural areas because people are assumed to make recordings close to their homes, and urban areas are more densely populated (Amano, Lamming, and Sutherland 2016; Balázs et al. 2021; Tiago et al. 2017a). We hypothesized that, temporally, most observations would have been recorded early summer, because most birds are vocally more active upon their arrival to breeding territory than across the summer. We also compared the recordings made by users (i.e., application recordings) to data collected with passive acoustic monitoring (PAM) recordings. Application and PAM recordings are complementary approaches that can be compared to probe whether active observations or user activity corresponds with the bird activity. Then, we explored user behaviour and reliability of user-based annotations. We investigated the distribution of user profiles in terms of whether they focused on actively seeking new species or displayed a more random recording scheme. We asked how reliable the information provided by the citizen scientists was by rigorously validating the user-based annotations of the identifications. We hypothesized that there would be more true-positives (i.e., the species suggested by the application was correct and correctly annotated) and true-negatives (i.e., the species suggested by the application was wrong and correctly annotated), which would indicate high data reliability, over false-positive (i.e., the species suggested by the application was wrong but annotated as correct) and false-negative (i.e., the species suggested by the application was correct but annotated as wrong) annotations.
Table 1
Summary of the questions and hypothesis outlined in this work. The listed research hypotheses are holistic alternative hypotheses for working null hypotheses (i.e., obtained data distribution deviates from an “all other things being equal” scenario). Thus, here they serve as methodological predictions rather than actual ecological hypotheses.
| QUESTION | HYPOTHESIS |
|---|---|
| How people welcomed the application? | User and observation number descriptive information: no specific hypothesis (H1) |
| How were the data distributed in space? | More data from urban than rural areas because more people live on the former (H2) |
| How were the data distributed in time? | More recordings from early summer when birds establish their breeding territories (H3) More recordings during days off than working days (H4) |
| How well did recordings match bird activity? | User recordings are not assumed to match bird activity as people are often active later (H5) |
| How reliable were user annotations? | More correct (true-positives, -negatives) than incorrect annotations (false-positives, -negatives) support annotation quality (H6) |
Methods
As a brief methodological summary, we first introduce the mobile phone application and the bird sound recognition model. Then, we describe the bird sound data collected through the mobile application and additional PAM, and lastly, we present the methods used for analysing these data.
Mobile phone application
The mobile phone application allows users to record bird sounds anonymously with observation location information, and get species identifications for the recordings. There are two different recording schemes: opportunistic and interval recording. Opportunistic recordings are at most 5-minute-long recordings, with which users can record any vocalizing bird or soundscape they want to identify. Interval recordings allow users to produce sound data through a more congruent sampling scheme. Interval recording period can be selected to be one, three, six, or twelve hours during which the application records one-minute clips followed by nine-minute pauses. Mobile application can ensure the phone’s battery sufficiency during interval recordings. If the battery charge is too low, the recording will be interrupted, and the saving will be secured. When sending the recording, the user is asked for permission to use the recording for scientific purposes (consent), and identification results containing the identified species and confidence level are returned to the user in real time. Users can evaluate the identification result on the application by indicating whether the identification is correct or not and if they have seen or heard the bird.
The user interface of the application is shown in Figure 1. In addition to making new recordings (panel a), users can view their previous recordings (panel b), see a list of all species they have observed (panel c), and inspect observations made by all users on a map that includes each bird, which has been classified with at least 90% confidence or confirmed by the user who made the recording (panel d). The application is available for free for both iPhone and Android users in Finland. Importantly, the user interface of the application was designed to be accessible for all kinds of users (i.e., for example, in terms of visual layout for people with poor eyesight).

Figure 1
The user interface contained four tabs. (a) Record: the main tab where recordings are made. (b) Observations: shows the list of all recordings made by the user, including the species identifications. (c) Identifications: shows a list of all species found from user’s recordings. (d) Map: all identifications made by all users. The list can be filtered based on species and date. The original interface was in Finnish; here, the depiction contains English translations. Information on sensitive species is not shown to all users, but the user can always see his/her own recordings.
When the initiative relies on technological developments to succeed, participation of specialists with suitable engineering skills is crucial. Managing large quantity datasets needs research infrastructure and staff, which in turn takes time, effort, resources, and coordination to capitalise on the potential for public engagement. The audio data and model outputs were thus stored centrally on CSC (Finnish IT Center for Science) servers. The users remain completely anonymous in the data collection process; however, the users can be contacted through the phone application by their anonymous user ID.
Advertising the application
The application was published in collaboration with the national public broadcasting company (i.e., the Finnish national broadcasting company, YLE), and thus it received significant attention in the media. The application was first publicly mentioned and used on April 12th 2023 on Metsäradio (Forest Radio), a show that discusses forestry, nature, and outdoor life. The application was also mentioned in the Luontoilta (Nature Evening) radio broadcast on May 4th. On May 10th, the application was advertised in YLE’s special Muuttolintujen kevät, a television broadcast featuring migratory birds, and on the main news television broadcast on YLE. It is noteworthy that the average audience for YLE News on television is 751,000, which represents roughly 14% of the Finnish population. YLE’s social media channels also promoted the Spring of Migratory Birds campaign to citizens throughout the spring, which included providing general information about migratory birds and how people could use the application to follow their arrival. In addition, the application was mentioned in several local and national journals during the spring as well as in various bird-related channels on social media. Gamification was also incorporated into the application. Based on the number of species observations, the user was given virtual badges (three levels), which could be shared through the application on Instagram, Facebook, and X/Twitter.
Bird sound recognition model
The identification model is a convolutional neural network (CNN), which was constructed by using the BirdNET recognition model (Kahl et al. 2021) as a convolutional base and fine-tuning the network with vocalizations of Finnish bird species roughly following the pipeline presented previously (Lauha et al. 2022). The vocalizations of Finnish birds were obtained from the bird sound library Xeno-canto (Xeno-canto Foundation 2005) and from Finnish field recordings collected with autonomous recording units within the ongoing Lifeplan project (ERC-synergy project No. 856506) and annotated in the Bird Sounds Global portal (https://bsg.laji.fi), as well as from a previous pilot project (Lehikoinen et al. 2023) and targeted field recordings by Harry J. Lehto. The audio data were split into 3-second segments and converted to spectrogram images, which were used as training data for the CNN. Similarly, new recordings are analysed by splitting them into 3-second segments with 1-second overlap and predicting for each segment separately.
The identification model was originally trained for 150 Finnish bird species, and during the summer of 2023, the list was supplemented with ten more species. In the mobile phone application, the predictions of the model are adjusted by decreasing the predictions for species that are unlikely to occur in the recording based on latitude and time of the year. The application shows all species that are detected with at least 60% confidence.
Audio data
We analyse the audio data collected through the application between April 1st, 2023 and July 31st, 2023.The descriptive numbers of user volume and recordings are introduced in the results. We compared the mobile phone recordings with biomonitoring data collected with autonomous passive acoustic recorders from six sampling sites across Finland. The sampling is conducted as a part of the international Lifeplan project (ERC-synergy project No. 856506). The sampling scheme on Lifeplan sites corresponds to continuous interval recording in the application (i.e., audio samples of one minute are recorded once every ten minutes). We randomly selected one recording per hour from each site from the beginning of April until the end of July, which led to a data set of 15,900 recordings containing 21,200 species identifications with at least 60% certainty (1.3 species per recording).
Analysis methods
We compared the recordings made by users (i.e., application recordings) with PAM recordings by visualizing the spatial and temporal distributions of recordings and bird observations in both data sets. To analyse the locality of users, we defined a centre location for each user by selecting the median latitude and median longitude of all recordings by user u. We then calculated the distances between all of a user’s recording locations φj,u, λj,u and the centre location . We quantified the temporal distribution of recordings and bird observations by calculating the total number of recordings and the total number of species identifications on different confidence levels for each day. To analyse the activity of birds and application users around the clock, we calculated the total number of bird observations with at least 60% confidence for each hour of the day and divided it by the total number of recordings for both application and PAM recordings.
We analysed the user retention by dividing users to three groups based on the total number of recordings made (setting the thresholds at 50 and 500 recordings) and calculating the temporal distribution of recordings for each group. For each user u, we calculated the proportion of recordings made in each day d by dividing the daily number of recordings rd,u by the total number of recordings made by user until the last day of the sampling period . We then calculated daily means of recording activity , for each group including all users who made their first recording at least d days before the end of the sampling period, and we visualized these average recording activities as a function of days since first recording.
The recording behaviour of active users was assessed by calculating species accumulation curves for all users with at least 200 recordings and all six PAM sites. The species accumulation curves were formed by calculating the cumulative number of unique species identified with at least 90% confidence as a function of the number of recordings made by the user.
We evaluated the quality of user annotations by selecting a sample of users and collecting a set of 50 annotations per user. These annotations were checked by an experienced bird specialist. Because user-annotated data can be used for improving the identification models (Kahl et al. 2021; Lauha et al. 2022), and instances where the model prediction is far from the true label are most useful for model improvement, we steered the data selection towards users who have generated the most annotations and data points in which the disagreement between the user and model prediction is greatest. To select the validation set, we calculated the number of positive annotations and number of negative annotations made by each user u and ranked the users by the geometric mean of and : . Geometric mean was used to ensure that the data set contains both positive and negative annotations from each user. We then selected 50 users with the highest . For each user, we sorted the annotations according to the disagreement between the user annotation and model prediction: , where ai,j is the user annotation (true = 1 or false = 0) and pi,j the model prediction for species i in recording j. Finally, for each user, we selected 50 annotations containing an equal amount of positive and negative annotations (if possible) from as many different species as possible starting from the highest values of di,j.
The user annotations are recording-specific, whereas model outputs are produced separately for several 3-second (s) clips within the recording. For each annotation, we selected the 3 s clip with the highest prediction for the target species including additional 0.5 s buffers before and after the clip to help the bird specialist to identify the correct species. The specialist listened through all 50 × 50 clips and noted if the target species was present in the recording.
Results
How people welcomed the application
During a single season (April–July 2023), the application was downloaded more than 220,000 times, and more than 140,000 users submitted close to three million recordings (on average 21.3 recordings per user) of the Finnish birds. The median user submitted 8 recordings (2.5–97.5% quantiles: 1–122), while the most active user submitted more than 5,000 recordings, and 7.2% of users were responsible for 50% of the recordings. The recordings contain more than 5.8 million species identifications with at least 60% confidence (2.0 species per recording). Also, 1.2 million identifications (20.6%) were annotated by the users to be either true or false. A vast majority of the recordings were opportunistic, and only 10,300 recordings (0.35%) were interval recordings. The median recording duration was 26.5 s for opportunistic recordings (2.5–97.5% quantiles: 4.1–299.8 s) and 373.5 s for interval recordings (341.5–3881.5 s). It is noteworthy that even though the application was designed to classify Finnish birds and is available only for Finnish users, a small proportion (ca. 1%) of recordings were from outside of Finland. There were recordings from every continent except Antarctica.
Spatio-temporal distribution of recordings
The recording locations strongly follow the population density in Finland, indicating that people tend to go birdwatching close to their homes (Figure 2). Indeed, most of the recordings were done within a ca. 10 km radius from the centre point of individual user–produced spatial data (Figure 3c). The maximum recording distance was typically a few hundred kilometers from the individual centre location and almost 25% of active users (i.e., people who made more than 50 recordings) made recordings further than 300 km from their home locality (Figure 3e). A handful of users recorded audio more than 1,100 km away from their centre location, which is close to the longest possible distance between two locations within Finland and thus corresponds to a person recording both on the coast of Southern Finland and in Northern Lapland. The geographical coverage of recording locations is comprehensive, especially in relatively densely inhabited Southern Finland. In remote locations, the spatial coverage is predictably much lower. The sparse road network of Northern Finland is evident in the heatmap of recordings (Figure 2).

Figure 2
Comparison between heatmap of recordings and population density in Finland. (a) The coloured dots show the number of recordings made by users and greater blue dots show the locations of six passive acoustic monitoring (PAM) sites. (b) The map of Finland coloured by the population density (Statistics Finland 2022).

Figure 3
Distributions of summary statistics of distance to user-specific centre location for users with more than 50 recordings. The panels show the (a) minimum, (b) lower quartile, (c) median, (d) upper quartile, and (e) maximum distances to the user-specific centre locations and (f) the distribution of standard deviations of distance to the centre location.
The recording behaviour of the users shows clear temporal patterns. Both recording activity and the number of species observed peaked for two weekends following May 10th, 2023, when YLE launched a special event on migratory birds and the application was featured on national television. Throughout the summer, the recording activity was notably higher on weekends than weekdays (Figure 4a). Excluding the activity peaks in mid May and on weekends, the recording activity remained at a similar level from late May to early July, after which it diminished (Figure 4b). Figure 5, too, shows that the recording activity of users decreases over time.

Figure 4
Number of species observations per day for application (a) and passive acoustic monitoring (PAM) recordings (b). The dashed blue lines show the number of recordings made for each day. In panel a, the vertical grey lines show dates when the application was featured on national radio (12.4.2023) and television (10.5.2023). Dashed vertical grey lines indicate Sundays.

Figure 5
User retention. The temporal distribution of recordings starting from the day of a user’s first recording grouped by the total number of recordings made.
Even though the recording activity of users depends on external factors such as the day of the week and the visibility of the application in the media, there is still a high and statistically very significant correlation (r = 0.749, p = 3.1e – 23) between the daily number of user recordings and the activity of birds (measured as daily species observations in PAM data), which indicates that people tend to produce more recordings when the birds are most active. However, within the day, this does not apply, and there is no statistically significant correlation between hourly number of user recordings and the vocalization activity of birds (r = 0.220, p = 0.302). Although bird activity peaked daily at approximately 5 a.m., people prefer to record at more convenient times and the hourly distribution of recordings peaked at approximately 10 a.m. and just before 8 p.m. (Figure 6). In the evening, people observe more birds than expected, and in the morning, less birds than expected compared with the prediction formed by calculating the product of bird vocalization activity (based on PAM species observations) and user recording activity. Application recordings also contain more bird observations than PAM recordings (Figure 6).

Figure 6
The recording activity during the day. Dashed curves show the hourly relative recording effort by passive acoustic monitoring and application (App) users. Solid curves show numbers of observed species divided by the total recording effort. Dotted brown curves show the prediction for application observations, if the number of observed birds would be equal to the product of bird singing activity (deduced from passive acoustic monitoring [PAM] species observations) and user recording activity. The prediction is normalized in two alternative ways: by setting the total number of observations to match with (a) application observations (higher curve) and (b) PAM observations (lower curve).
User behaviour and reliability of user-based annotations
With a great number of users comes a great number of unique recording patterns. The species accumulation curves of active users show that while some users seem to focus their efforts on maximising new species entries into their lists and obtain a new species on almost every third recording, others seem to be more interested in producing a large collection of recordings from their surroundings without actively targeting new species (Figure 7). However, all users seem to be motivated to record new species, and even the most slowly increasing species accumulation curves of users exceed those of PAM recorders. Unsurprisingly, most users made opportunistic recordings (i.e., to “hunt” for more species), which were most likely targeted at one or more vocalizing birds, whereas a large proportion of PAM recordings contains silent periods.

Figure 7
Species accumulation curves for users with at least 200 recordings. Red curves highlight three example users and blue curves show similar accumulation curves for passive acoustic monitoring (PAM) recordings. Every blue curve represents one sampling site.
To evaluate how reliable the citizen scientist–annotated bird-sound data is, we rigorously validated the bird application identifications as a proof-of-concept. The results show that there is marked variation in the reliability of users (Figure 8). In our sample of 50 users, the average user-specific correct answer percentage was 73%, while the lowest percentage was 46% and highest 98%. Out of the 2,500 user-validated cases, the majority were true negatives (38.2%), followed by true positives (35.7%), false positives (17.6%), and false negatives (8.5%).

Figure 8
Accuracy of 2,500 user annotations. (a) y-axis shows the proportion of negative annotations that did not contain the target species, and x-axis the proportion of positive annotations that contained the target species for the selected 50 users. (b) A confusion matrix for all annotations of the selected users. Top left corner shows true positives (the species suggested by the application was correct and correctly annotated), bottom right shows true negatives (the species suggested by the application was wrong and correctly annotated), bottom left shows false-positives (the species suggested by the application was wrong but annotated as correct), and top right shows false-negatives (the species suggested by the application was correct but annotated as wrong).
Discussion
We have demonstrated a successful digital citizen science initiative using a sound classifier application to enhance data collection of identified bird species. Our findings can be summarised as follows: First, citizen science can be a powerful method to harness public engagement to obtain large quantities of scientifically meaningful data (Bonney et al. 2014; Vohland et al. 2021). Second, spatial and temporal distribution of citizen science recordings mirrors human activity (Haklay 2013; Knape et al. 2022), which may diverge from that of birds, but still offers an extensive sampling network in comparison to classic means of sampling. Third, user engagement behaviour affects the accumulation of new species, and reliability of user-based annotations are varied (Lehikoinen et al. 2022; Meschini et al. 2021). We discuss these in turn.
How people welcomed the application
From participants’ point of view, mobile applications can be very engaging and are accessible to many; from researchers’ point of view, the application interface can be harnessed to support interoperability across platforms and to address many research questions. Our initiative sparked significant public engagement as reported in other previous citizen science studies (Haklay 2013; Lehikoinen et al. 2023; MacLeod and Scott 2021; Santangeli et al. 2023; Sullivan et al. 2014). The bird-sound application attracted more than 140,000 users (ca. 2.5% of the Finnish population of 5.5 million) during a single season through which they made close to three million recordings from which almost six million birds were identified by an AI-powered classification algorithm (H1; see Table 1). For the same period (April 1st–July 30th, 2023), the bird observation database of the BirdLife Finland (tiira.fi) contained 859,687 observations (retrieved December, 4th 2023). These observations were of significantly more individuals (ca. 241 million) because numbers are often greater when large flocks can be visualized. Thus, when effectively advertised, citizen scientists can become engaged in large-scale scientific research and accumulate data with remarkable intensity (Frigerio et al. 2021; Haklay 2013; Land-Zandstra, Agnello, and Gültekin 2021; Sullivan et al. 2014). The role of media and effective communication is crucial for the success of citizen science; high visibility attracts more people, which translates to more data. However, to provide scientifically robust data, the instructions must be as clear as possible to make the task accessible to the public.
Spatio-temporal distribution of recordings
The spatial sampling of user observations was widespread. However, recordings were focused close to where people normally go outdoors (H2), which causes sampling to be correlated with higher human population densities (Amano, Lamming, and Sutherland 2016; Balázs et al. 2021; Tiago et al. 2017a). Most observations were recorded within 10 km of the centre point of all recordings made by the user. Nevertheless, these recordings still accumulate a network of data points and thus serve as significant contribution in those locations. A better design for the systematic point count approach and expansion of the point count network into less populated areas would be important in mitigating potential sampling bias. As it is, the application encouraged people to record a high number of species by granting “awards” when a user’s species count reached certain levels. Gamification could be one solution to mitigate sampling biases by rewarding users recording data from underexplored areas across the seasons. Since regular and both spatially and temporally dense interval recordings are the most useful data for scientific research of birds, the application should guide and encourage people towards producing more standardised (e.g., in terms of location, time, and duration) natural sciences data (Dickinson, Zuckerberg, and Bonter 2010; Frigerio et al. 2021; Lehikoinen et al. 2023). We also note that of the bird fauna of Finland, many of the species occur elsewhere in the Europe. Thus, expanding sampling to cover the entire continent could be achievable in terms of existing training libraries of bird sounds, should the logistical requirements of data storage be addressed first.
The temporal sampling of recordings followed the progression of summer and the daily activity of birds (Moran et al. 2019; Robbins 1981). The recording activity over the season was highest in early summer (H3), April through May, after which it levelled out with the progression of summer in June, and it diminished in July. Throughout the summer, the recording activity was higher on weekends than weekdays (H4). Similar behaviour has also been observed in other citizen science projects (Di Cecco et al. 2021; Courter et al. 2013). The seasonality sets a challenge for the application. On the one hand, the application can identify the species, for example, from flight calls outside the breeding period (when the birds are less vocal), but on the other hand, it may require extra effort to motivate users to observe the sounds during quieter seasons (Thompson et al. 2023). The data from following seasons will show if the decrease in recording activity was because of the lower vocalization activity of birds towards the end of summer or because of user fatigue. Also, as some people use the application to learn to recognize bird species, it might be that those users cease to use the application when they can identify common species without it. The temporal recording behaviour of the users within the course of a day did not match the daily activity peaks of the bird singing activity (H5). The bird singing activity peaked at sunrise at approximately 5 a.m., but recordings peaked later in the morning at approximately 10 a.m. A more formal time-instructed sampling design (i.e., less opportunistic for gaining new species based on user behaviour and more focused on long-term recording) would help to overcome the challenges in temporal as well as spatial sampling (Frigerio et al. 2021; Knape et al. 2022; Tiago et al. 2017a).
User behaviour and reliability of user-based annotations
The recording behaviour of the users varied markedly. Some observers focussed their efforts on maximising new species entries into their lists, and they can be characterised as species collectors. At the other end of the scale are local observers, who are very committed to producing a high number of recordings in their own locality. These users actively make new recordings even if there are not always new species available, which makes the sampling temporally more balanced. As our user data is anonymized, we cannot evaluate potential demographic difference among user groups (Tiago et al. 2017b). However, regular sampling executed by a high number of observers provides valuable data about the occurrence of birds (Sullivan et al. 2014). These recordings could thus be used to predict progression of migration, especially when the recordings are made through the congruent interval sampling scheme, and the efforts of species collectors advance data collection on infrequent or rarely heard species.
Overall, our results highlight that citizen science–driven recording efforts complement sampling through PAM. In passive monitoring, the sampling effort is easy to balance temporally, and data of both presence and absence of birds is produced without bias caused by targeting opportunistic recordings to certain bird individuals or species. However, the sampling network comprising thousands of citizen scientists offers spatial density and coverage far beyond passive monitoring scenarios. Recordings made by users also contain, on average, more birds than passive recordings, which might be useful, for example, when collecting observations of rare species.
In this work, the accuracy of user annotations varied notably among the predicted categories as follows (H6): True positives (36.7%) represented the ideal way the application should yield data. True negatives were the most common category in our annotated cases (38.2%). These data are useful for improving the classification performance of the AI model. The obvious reason for true negatives are the presence of species that are vocally similar to the suggested species (such as common versus tree sparrow or common versus arctic tern); however, these problems may be tackled by extensive training sets to mitigate the similarity challenge. As the AI-powered classifier extracts a great number of parameters from a single vocalisation, being able to separate difficult species pairs is dependent on the extent of the training material. True negatives also included low-frequency anthropogenic sounds such as human speech and mobile phone or machinery noise, which automatic identification models tend to confuse with low-frequency bird vocalizations. False positives (17.6%) could be due to human misidentifications, notably a large number of users want to learn more about bird sounds but are not yet experts at identifying birds. It is also possible that the species was present in the original recording, but the highest model prediction pointed at wrong part of the recording and the target species was thus not detectable in the 4 s sample. Another user-based error could be that the bird was seen, but not heard, and got annotated based on wrong evidence. Also, purposefully wrongly annotated cases occurred (e.g., sometimes the recording was evidently not from a bird, but still annotated as a bird sound). False negatives (8.5%) could emerge, for example, from birds that were flying over but got missed (such as high-pitched siskins). Also, some users have reported annotating certain identifications as false as a means of trying to teach the AI model about another more prominent bird voice in the soundscape.
Ensuring data quality is often considered a potential weakness of citizen science studies (e.g., Balázs et al. 2021). For example, due to the varied quality of the annotations, to obtain a reliable data set, the observations identified by users may require manual filtering and validation. Based on our results, we do not recommend that user-identified data is utilized without any validation procedure. However, the massive number of recordings enables targeted search of those that are valuable, and those can then be manually validated with reasonable effort. Our results encourage the use of automatic species identification models in the collection of species occurrence data. To circumvent data quality problems that characterize many citizen science projects (i.e., due to the variable identification skills among citizen scientists), our approach stores the raw audio in a centralized repository, enabling rigorous validation and re-analysis in the future as the identification algorithm gets improved. This enables analysing past data in the near future, and allows for the potential to observe changes in distribution and timing of migrations, which is not within the reach of other common sound-classification applications. After the first year, the identification model has been re-trained using some of the data collected during summer 2023. The species list of the new model has been extended to cover 263 species (which covers all species that are regular annual visitors in Finland), and model accuracy and calibration have been improved. We caution, however, that many species that are naturally “silent,” such as many shore- and waterbirds, are beyond the reach of detection by sound through our application. Alternative observation methods may be required to detect these species’ presence.
Conclusions
The next step is to refine the method to gain better quality data, which allows deeper ecological questions to be asked. The insights derived could include bird species distribution, habitat use of the species, temporal activity, migration, and the use of citizen science data as a predictive method. More generally, citizen science is rewarding, but it may become very demanding to manage, depending on the scale of the project. Public engagement requires effective communication and feedback about the project goals in frequent intervals. Joint efforts of professional experts and citizen scientists are called for to harness the full potential of citizen science.
Data Accessibility Statement
Anonymous data of recordings and bird identifications and the codes to execute the analysis and produce the results and figures for this manuscript are available at https://doi.org/10.5281/zenodo.13326225.
Acknowledgements
We thank executive producer Ville Alijoki (Yle Science, Environment and History) for fruitful collaboration: The phone application fast received a broad user community largely thanks to the co-operation with Yle Nature and its promotion of the application in TV, radio, news articles, and social media. We thank CSC for providing the computing resources to run the application back-end services, and store and process the data generated by the application. We especially thank Jemal Tahir, Álvaro Gonzalez, and Tristan Perard from CSC for developing and maintaining the back-end services supporting the application, as well as Jesse Harrison for overseeing and supporting the collaboration. We thank Bess Hardwick from the University of Helsinki for managing the organization of Lifeplan data. We thank Jarno Tossavainen from Innofactor for contributions to the application development as well as Mikko Heikkinen from Luomus for advice on Finnish birds and the list of sensitive species. Finally, we thank all people who have contributed to collecting of the Finnish Lifeplan audio data including Teppo Salmirinne from Oulanka Research Station, John Loehr, Esa-Pekka Tuominen, Juho Kökkö and Joni Uusitalo from Lammi Biological Station, summer students at Hyytiälä forest station, technical staff of Värriö research station, Emma Vatka from the University of Oulu as well as research assistants and trainees of the Archipelago Research Institute.
Funding Information
Pauliina Schiestl-Aalto was funded by Academy of Finland Flagship Program (grant no. 337549). Otso Ovaskainen was funded by Academy of Finland (grant no. 336212 and 345110), and the European Union: The European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 856506: ERC-synergy project LIFEPLAN; and grant agreement No 101123091: ERC-PoC project Breaking the wall between professional science and citizen science by hyperautomation) and the HORIZON-INFRA-2021-TECH-01 project 101057437 (Biodiversity Digital Twin for Advanced Modelling, Simulation and Prediction Capabilities). The project also received internal funding from the University of Jyväskylä.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
ON and PL share first authorship and wrote the first draft of the manuscript. OO and AL conceptualized the idea for this project. PL conducted data mining and PL and ON contributed on analysis and data visualization. PL and PS contributed on development of the automatic identification model and AK, JT, AV, and AL contributed on development of the mobile phone application, data management and technological infrastructure facilitating data collection. SA, JH, JI, HJL, MM, RP, PSA, JS, and MV contributed on collection of audio data including PAM recordings and training data of the identification model. All authors contributed on conceptualizing the findings, editing the manuscript, and have approved the final version.
