Data quality issues including problematic responses from non-human bots, malicious respondents, and uncareful respondents can threaten the validity of online survey-based network data. In this study, we describe how we identified and handled potentially problematic responses in the ego network module of the Civic Health and Institutions Project, a 50 States Survey (CHIP50-NET). Specifically, we identified and excluded 4,282 respondents who named non-agent alters or invalid alters or who provided inconsistent responses (17% of the sample). Excluding these potentially problematic respondents from the CHIP50-NET sample had only modest effects on sample demographics. However, these exclusions did yield a prevalence estimate of an uncommon demographic group (adults who do not want children) that was closer to a recent, external benchmark estimate, providing evidence of the validity of our data cleaning efforts. We conclude with implications and recommendations for using the CHIP50-NET dataset specifically, and for cleaning data when conducting large-scale online network surveys more generally.
© 2025 Zachary P. Neal, Jennifer Watling Neal, published by International Network for Social Network Analysis (INSNA)
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.