Introduction
Public health intelligence (PHI) is essential to early warning alert and response (EWAR) systems, enabling timely detection, assessment, and response to health threats. PHI systems now integrate diverse data streams, including laboratory‑based surveillance, community reports, social media, and open‑source information, to enhance public health surveillance and response [1, 2].
There are two important data types that help inform PHI: public health event data, which provide details on health incidents (e.g., deaths, hospitalizations, test positivity rates), and contextual data [3]. Contextual insights improve interpretation by including information such as community composition, travel patterns, environment, health system capacities, medical countermeasures (MCMs), and social factors such as public sentiments and health behaviours [4]. Both types of content are vital for EWAR and outbreak risk assessment [3, 5, 6].
Despite PHI advances, evaluations incorporating contextual insights remain limited. Many PHI systems focus on timeliness and sensitivity of event‑driven data while neglecting broader contextual factors that affect public health responses [7, 8]. Furthermore, analysis of large volumes of unstructured, often text‑based information is labour‑intensive if done manually. While artificial intelligence (AI) tools such as language learning models are increasingly common, limitations to their accuracy pose concerns about ‘false alarms or failure to identify important epidemiological signals’ [9, 10]. This gap hinders a comprehensive understanding of health threats, risk assessment, and response strategies.
Standard guidance is needed for systematic contextual data extraction and analysis from event‑based surveillance (EBS) systems. The World Health Organization (WHO) leads the Epidemic Intelligence from Open Sources (EIOS) initiative in collaboration with the Joint Research Centre of the European Commission. Used daily by governments and other institutions globally, the EIOS system enhances the ability to perform PHI functions by integrating open‑source information from a broad range of tens of thousands of sources [11]. The content is automatically analysed and enriched with meta information, including identification of topics (categorization), geotagging, named entity recognition, translation, and summarization. The system is continuously being enhanced with new technology modules designed to streamline and improve surveillance.
This exploratory study quantifies and categorizes contextual insights from EIOS and recommends surveillance practices for contextual data extraction and qualitative analysis of unstructured text. The study aims to assess EIOS’s strengths and limitations and contribute to developing more effective, collaborative surveillance systems.
Methods
Use case selection
This evaluation retrospectively analysed outbreak‑related data from EIOS. Four use cases were selected through a literature review and consultation with the EIOS Core Team (ECT) at WHO. Each case represents a distinct geographic region and pathogen. Outbreak periods were defined using reference data from scientific or grey literature.
The first use case is an influenza outbreak in Brazil that started in the State of Rio de Janeiro due to the H3N2 Darwin type A influenza strain. The outbreak period began in November 2021 and ended in December 2021, with cases peaking in late November [12].
The second use case is a COVID‑19 surge in Kenya, driven by the introduction of the Omicron variant BA.1. Sources define this ‘fifth wave’ of COVID‑19 to be between mid‑November and mid‑December of 2021 [13, 14].
The third use case is a surge of measles in Afghanistan, one of many that have occurred in the past several years. While there were relatively low levels of transmission in 2019 and 2020, cases increased dramatically in 2021. Using a report from the WHO Eastern Mediterranean Regional Office as reference data, the outbreak period is defined from 1 January 2021 to 31 December 2022 [15].
The fourth use case pertains to a diphtheria outbreak in Nigeria. The outbreak period is based on reports from Nigerian Centre for Disease Control (NCDC), which documented a surge of suspected and confirmed diphtheria cases between 9 May 2022 and 1 October 2023 [16–18].
EIOS system boards
EIOS boards are customized search filters for monitoring diseases, symptoms, health threats, or contextual factors in a specific geographic area and time frame. Board results consist of web‑based information (e.g., press releases, news articles, social media posts) mined from open sources. System users have the option to use ‘pinning’ and ‘flagging’ features of the EIOS system to mark items of interest among these search results. Boards were created for each use case, filtering items by geographic region, disease, and combinations of multi‑lingual keyword searches known as categories. Time filters included two weeks before and after outbreak periods to ensure detection of important events and contextual insights.
Item triage
The items returned from each board were triaged, ‘which involves the sorting of data and information to discard duplicates and to exclude disinformation, irrelevant information, and false information to identify real events’ [19]. The triaging protocol was developed using a combination of documentation from various EIOS users’ published workflows and a standard operating procedure that was previously developed by the ECT for broad use. Non‑English text was translated using a web browser extension before review.
In a primary review, the analyst scanned items for contextual information using only article titles and summaries, which are automatically generated in the EIOS system using an abstractive summarization algorithm. The system’s flagging feature was used to mark and save items that appeared relevant to the outbreak use case. The analyst also took notes on common themes related to contextual insights that were observed in the summaries.
A secondary review involved reading the full text of the flagged items. These items were read starting with those imported into the EIOS system first (oldest import date). Items were saved using the pinning feature if they were relevant to the outbreak and contained new information that was not previously captured in previously pinned item. The analyst also noted thematic patterns in the contextual data among pinned items.
Qualitative coding
Observation notes from triage informed a qualitative codebook for each use case. Pinned items were coded using an iterative process, refining themes through repeated analysis and comparison. Each outbreak use case had a distinct list of thematic codes, with some common across cases [20]. This coding scheme resulted in distinct lists of thematic codes for each outbreak use case, although some themes could be common across all use cases. Supplementary File 1 contains the final code lists and assignment criteria.
Two primary types of codes were assigned to EIOS system items: public health events and contextual data. Event data included cases, suspected cases, deaths, and hospitalizations. Contextual data focused on those factors that impacted or informed outbreak response. It also included risk reduction factors such as public health and social measures (PHSM) and MCMs.
A single item could be coded more than once across the full text. For example, a single item might provide new event data for many states within a country in addition to sharing new information about local healthcare capacity and other contextual themes. Therefore, the cumulative frequency of codes may exceed the number of items.
Finally, contextual data codes were quantified by epidemiological week, using the EIOS import date, to track thematic trends during outbreaks. Qualitative descriptions of salient contextual themes also accompany the quantitative data.
Results
Outbreak Use Case 1: Influenza type A in Brazil (2021)
The board representing Use Case 1 resulted in 11,982 articles imported in EIOS between November and December in 2021. Search criteria included all influenza‑related categories and articles that mentioned any geographic region within Brazil. After triage, there were 1392 articles (11.6%) remaining that contained unique, outbreak‑relevant information. Among these articles, there were 751 outbreak event‑related codes and 861 codes applied to contextual insights (Table 1).
Table 1
Summary of contextual insights coded among triaged articles in Outbreak Use Case 1, across all epidemiological weeks.
| THEMATIC CODE | NUMBER CODED (N = 861) |
|---|---|
| Clinical presentation | 33 (3.8%) |
| Cluster detection | 12 (1.4%) |
| Environmental conditions | 13 (1.5%) |
| Health behaviours, knowledge, and attitudes | 67 (7.8%) |
| Mass gathering warning | 17 (2.0%) |
| Public health infrastructure | 198 (23.0%) |
| Relaxed restrictions | 11 (1.2%) |
| Vulnerable groups | 69 (8.0%) |
| Mitigation measures: PHSMs | 100 (11.6%) |
| Mitigation measures: MCMs | 341 (39.6%) |
The contextual data codes provide information about environmental, sociopolitical, and other themes at various stages of the H3N2 outbreak in Brazil (Figure 1). Public health infrastructure issues made up nearly one‑quarter of the codes, primarily related to healthcare capacity, including bed shortages and doctor shortages. Items covering data availability and accessibility observably increased as the influenza cases spiked; these items were local articles, press releases, and other updates on a hack that affected the national Ministry of Health and its data reporting systems [19]. Shortly after the initial confirmation of the hack, additional reports were released to explain the extent of the data infrastructure problem and the kinds of influenza surveillance barriers it posed as the Ministry of Health attempted to recover from the data breach [21, 22].

Figure 1
Contextual data identified from EIOS System during Brazil H3N2 influenza outbreak, 2021.
Other important data obtained from the EIOS system during this outbreak related to clinical presentation, notably the differences in disease severity between influenza variants. In addition, mass gathering warnings and cluster detection codes were applied before, during, and after large gatherings, such as Carnival. Officials in various Brazilian localities would release warnings and social distancing guidelines prior to such events, while media would report clusters of confirmed influenza cases after the gathering.
Health behaviours, knowledge, or attitudes were related to resistance to precautions to prevent transmission of both COVID‑19 and influenza. Some articles noted resistance among political authorities, resulting in relaxed precautions. Articles coded with this theme appeared at the very early stages of the influenza outbreak, as well as the later epidemiological weeks when cases drastically decreased. Vulnerable groups were also often identified, particularly among healthcare workers and teachers. This code was also applied alongside the environmental conditions code, which most often referred to disease vulnerability concerns in the aftermath of extreme flooding or heat in some areas.
There were relatively few PHSMs reported during the outbreak compared to MCMs, which mainly included vaccination campaigns and increasing hospital capacity. PHSMs were mainly cancellations of mass gatherings and masking mandates. These became more frequent during Epidemiological Week 14, which coincided with the peak of the outbreak according to the reference data.
Outbreak Use Case 2: COVID‑19 Omicron variant in Kenya (2021)
The board representing Use Case 2 resulted in 7314 articles in EIOS between November and December in 2021. Search criteria included all categories containing keywords related to COVID‑19, SARS‑CoV‑2, and its variants and articles that mentioned any geographic region within Kenya. After completing the pre‑defined triage process and qualitative coding, 427 articles (17.1%) containing unique, outbreak‑relevant information remained. After performing thematic coding on these triaged articles, there were a total of 343 event‑related codes. There were 165 codes applied to contextual insights, which are summarized and categorized in Table 2.
Table 2
Summary of contextual insights coded among triaged articles in Outbreak Use Case 2, across all epidemiological weeks.
| THEMATIC CODE | NUMBER CODED (N = 165) |
|---|---|
| Clinical presentation | 8 (4.8%) |
| Health behaviours, knowledge, and attitudes | 29 (17.6%) |
| Mass gathering warning | 7 (4.2%) |
| Public health infrastructure | 32 (19.4%) |
| Relaxed restrictions | 4 (2.4%) |
| Transmissibility | 3 (1.8%) |
| Vulnerable groups | 22 (13.3%) |
| Mitigation measures: PHSMs | 45 (27.2%) |
| Mitigation measures: MCMs | 15 (9.1%) |
Figure 2 exhibits the contextual data coded across EIOS system items over the outbreak period. Public health infrastructure was a salient theme in this use case, mainly covering healthcare capacity issues and vaccine shortages. Periodic reports mentioned reduced lab testing capacity, accounting for the unavailability of variant information.

Figure 2
Contextual data identified from EIOS System during Kenya COVID‑19 Omicron surge, 2021.
Another important contextual theme was health behaviour, attitudes, and knowledge. The ‘fifth wave’ of COVID‑19 in Kenya emerged as the country relaxed restrictions, leading to a reinstatement of certain measures such as curfews. Items reported resistance to continued curfews and a proposed vaccine mandate. Other articles mentioned vaccine hesitancy in faith‑based communities and increased travel post‑lockdown. Some articles in this theme included speculation about the impact of media disinformation on vaccine uptake.
Articles coded as mass gathering warnings focused on reported increases in hotel bookings and renewed tourism influxes. Vulnerable groups were also reported, particularly healthcare workers, truck drivers, school‑aged children, and the rural poor. Clinical presentation‑coded items were less frequent in this use case because COVID‑19 symptoms and virological mechanisms were already common knowledge among public health professionals. However, speculation about the high transmissibility of Omicron based on patient observations in Kenya was coded. These codes also applied to unique outcomes and indications of COVID‑19 disease severity if they were observed in Kenyan health facilities during the fifth wave.
Most mitigation measures detected were county and province‑level vaccine campaigns, including vaccine donations from foreign agencies. The results also showed an increase in PHSMs reported after Epidemiological Week 8, corresponding with the peak case volume indicated by the WHO reference data. PHSMs included increased border screenings, testing mandates, and travel restrictions.
Outbreak Use Case 3: Measles in Afghanistan (2021–2022)
The board representing Use Case 3 resulted in 1745 total items from 2021 January to 2022 December. Search criteria included all measles‑related categories and articles that mentioned any geographic region within Afghanistan. After 2 rounds of reviewing, coding, de‑duplicating, 109 articles (6.2%) containing unique, outbreak‑relevant information remained. After performing thematic coding on these triaged articles, there were a total of 128 codes applied to text that provided event data. There were 71 codes applied to contextual insights, which are summarized and categorized in Table 3.
Table 3
Summary of contextual insights coded among triaged articles in Outbreak Use Case 3, across all epidemiological weeks.
| THEMATIC CODES | NUMBER CODED (N = 71) |
|---|---|
| Clinical presentation | 12 (16.9%) |
| Environmental conditions | 6 (8.4%) |
| Health behaviours, knowledge, and attitudes | 3 (4.2%) |
| Public health infrastructure | 13 (18.3%) |
| Vulnerable groups | 7 (9.9%) |
| Mitigation measures: PHSMs | 12 (16.9%) |
| Mitigation measures: MCMs | 18 (25.4%) |
Figure 3 illustrates the contextual data obtained from the EIOS system during the outbreak period. Key public health infrastructure context coded among the EIOS system items was related to barriers to providing care and vaccination in remote areas and medical personnel shortages. The potential issue of vaccine resistance was raised early in the outbreak because of distrust of vaccination campaigns in the region [23]. Although most of the symptoms reported were based on previously known clinical evidence, items were coded (clinical presentation) when measles severity was linked to local conditions. For example, local health officials stated observations that symptoms worsened among children experiencing malnutrition. In addition, earthquakes were coded as environmental conditions that impacted certain provinces’ ability to receive care and therefore increased their risk of vulnerability to many diseases, measles included. The main contextual insights that preceded the peak of the outbreak were about unvaccinated populations and disease risk among socially vulnerable groups, including children, pregnant women, and those living in remote or inaccessible villages.

Figure 3
Contextual data identified from EIOS System during Afghanistan measles outbreak, 2021‑2022.
MCMs in various regions of Afghanistan were reported throughout the outbreak. PHSMs reported in media increased about halfway through the outbreak during Epidemiological Week 20. Frequently mentioned mitigation measures included vaccination campaigns, donations and volunteers increasing healthcare service capacity, and expanding surveillance in remote areas.
Outbreak Use Case 4: Diphtheria in Nigeria (2022–2023)
The board representing Use Case 4 resulted in 1309 articles from 2022 May to 2023 October. Search criteria included all diphtheria‑related categories and articles that mentioned any geographic region within Nigeria. After completing the pre‑defined triage process and coding the remaining articles, 158 articles (8.3%) remained. After performing thematic coding on these triaged articles, there were a total of 406 codes applied to text that provided event data. There were 129 codes applied to contextual insights, which are summarized and categorized in Table 4.
Table 4
Summary of contextual insights coded among triaged articles in Outbreak Use Case 4, across all epidemiological weeks.
| THEMATIC CODES | NUMBER CODED (N = 129) |
|---|---|
| Clinical presentation | 5 (3.9%) |
| Environmental conditions | 2 (1.6%) |
| Health behaviours, knowledge, and attitudes | 9 (7.0%) |
| Public health infrastructure | 20 (15.5%) |
| Relaxed restrictions | 2 (1.6%) |
| Vulnerable groups | 10 (7.8%) |
| Mitigation measures: PHSMs | 33 (25.6%) |
| Mitigation measures: MCMs | 48 (37.2%) |
The most salient contextual themes included low vaccine uptake and barriers to providing care in remote areas (Figure 4). Articles coded with these themes described low vaccine turnout, though only a minority attributed this to health behaviours, knowledge, and attitudes such as vaccine hesitancy. Rather, reporters wrote about the inaccessibility of vaccines in remote locations. Economic challenges leading to reduced testing capacity and doctor shortages were a major theme related to public health infrastructure. This also included vaccine and medicine shortages, but these supply challenges were less common. Ongoing lab testing capacity was reported as a cause of data availability issues throughout the outbreak period.

Figure 4
Contextual data identified from EIOS System during Nigeria diphtheria outbreak, 2022‑2023.
Most articles provided information about known symptoms of diphtheria as a warning for parents of children and providers; however, some media reports specifically outlined the clinical presentation of patients observed in Nigerian facilities, and these were coded when a new symptom or description of disease severity was provided. Vulnerable groups were also coded the first time a group in Nigeria was reported as being at‑risk; these included the unvaccinated, healthcare workers, and asthmatic individuals.
Most diphtheria mitigation efforts during this outbreak were pharmacological (vaccination and treatment distributions) and conducted at the national level. PHSMs were also reported throughout the outbreak, but these items in the EIOS system increased in Epidemiological Week 38. This corresponds to approximately five weeks after the peak of the outbreak, based on the reference data. A common form of preventive measure reported was community education to encourage vaccination.
Discussion
The findings from these use cases demonstrate how the EIOS system can augment PHI capacity and enrich communicable disease event data from indicator‑based surveillance systems through its contextual insights. Just as case counts and other events data inform epidemiological assessments, contextual data can enhance situational awareness and response to public health risks. Although each use case in this study involved distinct pathogens and geographic regions, common themes emerged from the web‑based sources pulled from all EIOS system searches. For instance, public health infrastructure was a dominant issue in multiple outbreak scenarios. In Brazil and Kenya, shortages of hospital beds and healthcare workers, as well as vaccine shortages, were frequently discussed in news items. Similarly, Nigerian and Afghan news media highlighted the lack of accessible healthcare in remote areas in the context of their respective diphtheria and measles outbreaks. These findings can support calls for investments in healthcare capacity‑building and rapid deployment of resources to prevent disease spread.
Other commonly assigned themes related to behavioural factors, vulnerable populations, and the physical environment. Vaccine hesitancy, resistance to public health mandates, and misinformation were evident in the use cases and were attributed to relaxed policies and low vaccine rates in many EIOS system items obtained. The identification of vulnerable groups, including healthcare workers, children, and individuals with underlying conditions, was an important finding for equity considerations. In addition, environmental conditions, such as flooding, earthquakes, and unseasonal temperature rises, compounded public health challenges by limiting access to care and exacerbating disease vulnerability for each outbreak. Finally, mitigation measures were also key contextual themes found in the board results, especially as the countries’ outbreak responses relied on vaccination campaigns. An increase in PHSM codes after the outbreak surges, namely in the Brazil and Kenya use cases, may suggest a need for earlier implementation of preventive measures to mitigate infectious disease outbreak severity.
Limitations
It is important to note that the data resulting from each EIOS system search result is based on the web‑based sources being mined. While sources in the EIOS system are continuously being updated and added, more recent use cases may have additional web‑based sources to inform outbreak response and detection. Therefore, the retrospective analysis is not necessarily reflective of the current system. It is also important to note that there may be discrepancies between the EIOS item import date and time and the time at which the item was published to the web; however, previous EIOS system evaluations found generally timeliness of event detection [24–26].
In addition, there are fewer sources in the EIOS system from certain states, as exhibited in the Afghanistan use case, resulting in fewer triaged items and limiting the generalizability of the results. This presents an opportunity for additional collaboration with these states on source selection and integration into the EIOS system.
The usefulness of contextual data may be subjective for ministries of health and other authorities, depending on the specific themes that are found. Some themes may reflect well‑known issues, for instance, that do not need an EBS tool to uncover in an outbreak scenario. Finally, while manual thematic coding carries the advantage of flexibility and accuracy when compared to most automated technologies available, incorporating the methods employed in this study in routine surveillance would require substantial human resources.
Public health impact
These qualitative data can be potential warnings for increased disease impact in the early stages of an outbreak. In addition, thematic trends and patterns across outbreak periods might help identify future risks and areas of need for improving outbreak response. These example results illustrate how analysis of text‑based PHI system data can help inform authorities on thematic patterns in public health discourse to potentially shape communication priorities and campaigns as outbreaks progress.
This evaluation may serve as a basis for manual or AI‑driven qualitative coding practices in routine disease surveillance. EIOS system users who monitor diseases at the national and local levels may predetermine themes of importance based on these results and other existing evidence about contextual factors that affect disease spread, outbreak response, and detection. The findings can also support broader, more comprehensive pre‑built categories in the EIOS system, which currently tend to include keywords that are limited to specific diseases and pathogens. Examples can include local attitudes, geopolitical concerns, or environmental conditions that may increase or mitigate disease risk. Health officials may establish a system for maintaining items that are manually categorized with contextual themes of interest, so that thematic codes may continue to be refined with different users’ interpretations. Improving on best practices for monitoring pathogens using reflexive coding schemes may highlight how to use contextual insights to parse risks across the population and potentially distinguish contextual patterns surrounding emerging diseases, seasonal spikes, and off‑season surges.
Future research efforts to improve upon the qualitative approach of this evaluation include exploring advanced methods, such as topic modelling or network analyses, to understand the relationships between contextual factors, case surges, and other public health events. This may be triangulated with additional ‘gold standard’ indicator data or qualitative information from users in the EIOS system’s community of practice. These methods of ‘contextualised intelligence’, among other qualitative methods, may not only strengthen PHI but also improve international and cross‑sector collaborations [27]. Therefore, the inclusion of contextual insights in future evaluations and standard operating procedures has the potential enhance the predictive power and decision‑making capacity of these event‑based, collaborative surveillance systems through collaborative and interdisciplinary health surveillance.
Ethical Statement
This study did not involve human subjects, patient data, or personally identifiable information. It solely utilized publicly available, open‑source data to evaluate a public health surveillance system. As such, ethical approval was not required.
Funding Statement
The authors acknowledge the funding support from the United States Centers for Disease Control and Prevention (Grant #GH191967).
Competing Interests
The authors have no competing interests to declare.
Authors’ Contributions
FK led study design, data analysis, and manuscript preparation. RMC, CB, JF, and YKL supported study design and contributed to manuscript preparation. JS contributed to manuscript preparation. PA contributed to manuscript preparation and oversaw all study activities.
Additional File
The additional file for this article can be found as follows:
Supplementary File 1
Qualitative Codebook used for thematic content analysis. DOI: https://doi.org/10.5334/aogh.4832.s1
This finding of multiple local introgression events has at least three key implications.
