Abstract
Data quality issues including problematic responses from non-human bots, malicious respondents, and uncareful respondents can threaten the validity of online survey-based network data. In this study, we describe how we identified and handled potentially problematic responses in the ego network module of the Civic Health and Institutions Project, a 50 States Survey (CHIP50-NET). Specifically, we identified and excluded 4,282 respondents who named non-agent alters or invalid alters or who provided inconsistent responses (17% of the sample). Excluding these potentially problematic respondents from the CHIP50-NET sample had only modest effects on sample demographics. However, these exclusions did yield a prevalence estimate of an uncommon demographic group (adults who do not want children) that was closer to a recent, external benchmark estimate, providing evidence of the validity of our data cleaning efforts. We conclude with implications and recommendations for using the CHIP50-NET dataset specifically, and for cleaning data when conducting large-scale online network surveys more generally.