
Figure 1
Organic structure of data management in the SEMILLA research study (2018–2024).

Figure 2
Workflow of data management in SEMILLA research study (2018–2024).
Table 1
Possible combinations of Permanence and compliance with Assessments.
| ASSESSMENT | |||
|---|---|---|---|
| YES | NO | ||
| PERMANENCE | Yes | Participant continued in the study and complied with study activities | Participant continued in the study but did not comply with the activities, the reason is detailed in Observation |
| No | ⦰ | Participant did not continue in the study; the reason is detailed in Observation | |
Table 2
Key lessons for future cohort studies, based on challenges identified in SEMILLA.
| COMPONENT | CHALLENGES IDENTIFIED | LESSON |
|---|---|---|
| Planning | The absence of a data management protocol in the early stages of data collection required redesigning instruments and training interviewers while fieldwork was already in progress. This led to initial data entry errors and made it difficult to validate entries promptly due to the initial choice of ODK as the capture system. | -Develop a data management protocol to define resources, timelines, and effective procedures for data generation. -Involve the data manager from the instrument design stage to anticipate critical requirements for data collection, such as the appropriate software based on instrument complexity and the workflow needed to guarantee data quality. |
| Instrument construction and refinement | Long questionnaires caused participant fatigue; some participants memorized the questions and responded mechanically. Additionally, certain concepts were misunderstood (e.g., paid work, marital status, ‘household members’), which required rewording and additional field instructions. | -Validate each instrument not only for content but also for length and usability, evaluating the degree of fatigue of both the participant and the interviewer. -Avoid redundant items and adjust the language to the participants’ sociocultural context and the interviewers’ training level to ensure that each question yields high-quality responses. |
| Data collection procedures | Omissions of questionnaires, incomplete activities, and typographical errors in idmadre were observed. The Tracking Planner, implemented from the start, allowed weekly monitoring and required clear justifications. Later, migration to CSPro further strengthened this control by incorporating automatic validations and skip checks during data entry. | Implement a monitoring protocol with periodic data entry validations for each instrument to ensure timely correction of inconsistencies and improve data accuracy. |
| Staff and training | Some interviewers struggled to build rapport and to correctly apply skip patterns or specialized activities. | -Develop a training manual and a checklist of best practices for interviewers, complemented by continuous feedback. -In some cases, interviewers required additional support to establish rapport with respondents and to apply skip patterns or specialized activities accurately. |
| Instrument coding | At the beginning of the study, both the field and data management teams were still becoming familiar with the coding rules. This learning phase required ongoing supervision to ensure consistent application of the criteria, which initially resulted in some inconsistencies in variable naming and delays in data cleaning. Once the rules were fully standardized and consolidated, errors could be identified and resolved more efficiently. | -Share coding rules with the field team to streamline data cleaning and, if necessary, to facilitate re-interviews. -Automate double-entry procedures wherever possible and run checks for an early detection of systematic errors. |
| Software programming | The initial use of ODK generated multiple technical limitations (e.g., handling complex skip patterns, ensuring longitudinal follow-up). Detecting these problems and migrating to CSPro was a key decision that improved data quality without affecting the fieldwork calendar. | Maintain flexibility in choosing data collection platform; be open to system migration, even mid-operation, if technical limitations arise. In SEMILLA, transitioning from ODK to CSPro allowed us to resolve operational challenges without disrupting the fieldwork schedule. |
| Documentation | Technical documentation was prepared at the end of data collection, which made it impossible to identify in time problems such as recall bias in information about pesticide application or last pregnancies, which could not be rectified retrospectively. | Prepare as many manuals, protocols, and field reports as possible before starting the data collection, as each record is an essential resource to reproduce the workflow, guarantee traceability, and facilitate the reuse of the data by other researchers. |
