Abstract
Participatory science platforms like iNaturalist and eBird support large, engaged communities of observers who produce substantial amounts of biodiversity data. Despite similarities across platforms, their participants are quite distinct in profile (e.g., hobbyists versus enthusiasts) and contribute data in different manners, frequencies, and magnitudes, raising questions about platform-specific biases and whether combining species-level data across platforms is advisable. This study establishes a methodology for assessing mergeability of observation records from iNaturalist and eBird using relative temporal distributions. Specifically, we employed circular statistical methods to compare seasonality patterns using records from 254 bird species in Northern California and Nevada during 2019 and 2022. We developed a circular optimal transport-based test to assess statistical equivalence of frequency distributions within species across platforms and years. Using eBird 2022 as a baseline, we found that over 97% of species were mergeable from eBird 2019 and iNaturalist 2022 datasets, and over 88% of species were mergeable using iNaturalist 2019 records. Subsequent comparison of combined data revealed archetypal seasonality patterns that matched known migratory behaviors or otherwise exhibited discrepancies explainable with expert knowledge about observer and bird behavior. Our findings provide quantitative evidence to suggest that only a small minority of species in our study exhibit major differences that prevent them from being reasonably integrated across platforms, and our approach highlights the importance of multidisciplinary collaboration in analyzing participatory science data. These results indicate transformative potential for participatory science projects small and large to contribute to broadscale analyses by organizing and pooling data across projects.
