Have a personal or library account? Click to login
Data Management in a Community-Based Birth Cohort: What the SEMILLA Study Teaches Us Cover

Data Management in a Community-Based Birth Cohort: What the SEMILLA Study Teaches Us

Open Access
|Feb 2026

Figures & Tables

dsj-25-2027-g1.png
Figure 1

Organic structure of data management in the SEMILLA research study (2018–2024).

dsj-25-2027-g2.png
Figure 2

Workflow of data management in SEMILLA research study (2018–2024).

Table 1

Possible combinations of Permanence and compliance with Assessments.

ASSESSMENT
YESNO
PERMANENCEYesParticipant continued in the study and complied with study activitiesParticipant continued in the study but did not comply with the activities, the reason is detailed in Observation
NoParticipant did not continue in the study; the reason is detailed in Observation
Table 2

Key lessons for future cohort studies, based on challenges identified in SEMILLA.

COMPONENTCHALLENGES IDENTIFIEDLESSON
PlanningThe absence of a data management protocol in the early stages of data collection required redesigning instruments and training interviewers while fieldwork was already in progress. This led to initial data entry errors and made it difficult to validate entries promptly due to the initial choice of ODK as the capture system.-Develop a data management protocol to define resources, timelines, and effective procedures for data generation.
-Involve the data manager from the instrument design stage to anticipate critical requirements for data collection, such as the appropriate software based on instrument complexity and the workflow needed to guarantee data quality.
Instrument construction and refinementLong questionnaires caused participant fatigue; some participants memorized the questions and responded mechanically. Additionally, certain concepts were misunderstood (e.g., paid work, marital status, ‘household members’), which required rewording and additional field instructions.-Validate each instrument not only for content but also for length and usability, evaluating the degree of fatigue of both the participant and the interviewer.
-Avoid redundant items and adjust the language to the participants’ sociocultural context and the interviewers’ training level to ensure that each question yields high-quality responses.
Data collection proceduresOmissions of questionnaires, incomplete activities, and typographical errors in idmadre were observed. The Tracking Planner, implemented from the start, allowed weekly monitoring and required clear justifications. Later, migration to CSPro further strengthened this control by incorporating automatic validations and skip checks during data entry.Implement a monitoring protocol with periodic data entry validations for each instrument to ensure timely correction of inconsistencies and improve data accuracy.
Staff and trainingSome interviewers struggled to build rapport and to correctly apply skip patterns or specialized activities.-Develop a training manual and a checklist of best practices for interviewers, complemented by continuous feedback.
-In some cases, interviewers required additional support to establish rapport with respondents and to apply skip patterns or specialized activities accurately.
Instrument codingAt the beginning of the study, both the field and data management teams were still becoming familiar with the coding rules. This learning phase required ongoing supervision to ensure consistent application of the criteria, which initially resulted in some inconsistencies in variable naming and delays in data cleaning. Once the rules were fully standardized and consolidated, errors could be identified and resolved more efficiently.-Share coding rules with the field team to streamline data cleaning and, if necessary, to facilitate re-interviews.
-Automate double-entry procedures wherever possible and run checks for an early detection of systematic errors.
Software programmingThe initial use of ODK generated multiple technical limitations (e.g., handling complex skip patterns, ensuring longitudinal follow-up). Detecting these problems and migrating to CSPro was a key decision that improved data quality without affecting the fieldwork calendar.Maintain flexibility in choosing data collection platform; be open to system migration, even mid-operation, if technical limitations arise. In SEMILLA, transitioning from ODK to CSPro allowed us to resolve operational challenges without disrupting the fieldwork schedule.
DocumentationTechnical documentation was prepared at the end of data collection, which made it impossible to identify in time problems such as recall bias in information about pesticide application or last pregnancies, which could not be rectified retrospectively.Prepare as many manuals, protocols, and field reports as possible before starting the data collection, as each record is an essential resource to reproduce the workflow, guarantee traceability, and facilitate the reuse of the data by other researchers.
Language: English
Submitted on: Jul 1, 2025
|
Accepted on: Jan 13, 2026
|
Published on: Feb 6, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Nataly Cadena, Fadya Orozco, Stephanie Montenegro, Fabián Muñoz, Alexis J. Handal, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.