ReviewAid: An Open-Source Tool for Efficient PICO-Based Screening and Data Extraction in Systematic Reviews

Vihaan Sahu; Mohith Balakrishnan

doi:10.5334/jors.672

Figures & Tables

Multi-tier screening and confidence assignment framework.

Table 1

Confidence Score Interpretation.

CONFIDENCE SCORE	CLASSIFICATION	DESCRIPTION	IMPLICATION
1.0 (100%)	Definitive Match	Rule-based classification/No ambiguity.	Fully automated decision.
0.8–1.0	Very High	AI strongly validates the decision using explicit evidence.	Safe to accept.
0.6–0.79	High	Criteria appear satisfied based on standard academic content.	Review is optional.
0.4–0.59	Moderate	Ambiguous context or loosely met criteria.	Manual verification recommended.
0.1–0.39	Low	Based mainly on keyword estimation.	High risk of error.
<0.1	Unreliable	Derived from failed extraction methods.	Mandatory manual review.

Robust API response parsing and recovery workflow.

Table 2

Software validation results – extraction field accuracy (N = 19).

FIELD	ACCURACY
Make and Model	89.47%
Wear Location	84.21%
Participant Age	84.21%
Participant Gender	78.94%
Technology	78.94%
Sensor(s)	68.42%
Sample Size (Strict)	63.15%

Table 3

Software validation results – confidence level predictive validity.

CONFIDENCE BUCKET	# OF PAPERS	AVERAGE FIELD ACCURACY
High (0.8–1.0)	13	93.4%
Medium (0.6–0.79)	3	57.1%
Low (<0.6)	4	71.4%

Table 4

Software validation results – extractor performance by AI provider.

PROVIDER	MODEL	TIME (MM:SS)	ERROR RATE	TESTER NOTES
Default/GLM	GLM-4.6V-Flash (6)	4:39	0%	Fast with no errors
OpenAI	gpt-4o (11)	3:52	0%	Fast, no errors
Anthropic	Claude-Sonnet-4–20250514 (12)	4:13	0%	Fast, good for extraction
Cohere	Command-A-03–2025 (14)	1:53	0%	Very fast and accurate
Deepseek	DeepSeek-Chat (13)	4:57	0%	Accurate, slower speed

Table 5

Software validation results – full-text AI data extraction result.

FILENAME	CONFIDENCE	PAPER TITLE	CONCLUSION	TYPE OF STUDY	POPULATION	INTERVENTION	COMPARISON	OUTCOME	RESULTS
1-s2.0-S0022510X21003166-main.pdf	0.95	Wearing-off symptoms during standard and extended natalizumab dosing intervals: Experiences from the COVID-19 pandemic [16]	Our observations support the need to study the effect of EID on wearing-off symptoms in randomized controlled trials.	Observational study	30 relapsing-remitting multiple sclerosis (RRMS) patients over 18 years of age receiving natalizumab at the Department of Neurology, Haukeland University Hospital	Extended interval dosing (EID) of natalizumab every six weeks	Standard interval dosing (SID) of natalizumab every four weeks	Change in prevalence or intensity of wearing-off symptoms	50% (15/30) reported new or increased wearing-off symptoms during EID. Symptom increase was more frequent among patients with pre-existing wearing-off symptoms during SID compared to patients without such pre-existing symptoms [p = 0.0005]. None had decreased symptoms or signs of clinical relapse.
1-s2.0-S221103482100612X-main.pdf	0.9	Safety of Natalizumab infusion in multiple sclerosis patients during active SARS-CoV-2 infection [17]	Natalizumab redosing in people with multiple sclerosis during active SARS-CoV-2 infection is not associated with worsening of COVID-19 symptoms or recovery delay and is reasonably safe. The data supports the safety of NTZ redosing in these circumstances and suggests not to delay retreatment to minimize the risk of MS rebound.	Retrospective observational case series/cohort study	18 relapsing-remitting people with Multiple Sclerosis (pwMS) under Natalizumab treatment, infected by SARS-CoV-2 between October 2020 and May 2021, from 6 Italian MS centers. All had mild COVID-19.	Natalizumab (NTZ) reinfusion (retreatment/redosing) during confirmed active SARS-CoV-2 infection (before achieving a negative swab).	Not Found (The study is a single-arm observational study with no explicit comparison group. Implicit comparison is to general population recovery times.)	Safety outcomes: worsening of SARS-CoV-2 infection symptoms, recovery delay, development of new neurological symptoms suggestive of CNS invasion, time to full recovery, and interval from positive to negative swab.	No patient reported worsening of SARS-CoV-2 symptoms or developed new neurological symptoms after redosing. Mean time to full recovery after NTZ for symptomatic patients was 10 ± 12 days. For the whole cohort, mean interval from first symptom to full recovery was 13 ± 9 days. Mean interval from first positive to first negative swab was 32 ± 15 days. No patient required oxygen support or hospitalization.

Table 6

Software validation results – screener performance by AI provider.

PROVIDER	MODEL	TIME (MM:SS)	ERROR RATE	TESTER NOTES
Default/GLM	GLM-4.6V-Flash [6]	2:49	0%	Fast; every field found
OpenAI	gpt-4o [11]	3:46	0%	Fast, no errors
Anthropic	Claude-Sonnet-4–20250514 [12]	3:46	0%	Good; better for screening than OpenAI
Cohere	Command-A-03–2025 [14]	1:33	0%	Screened everything; fastest
Deepseek	DeepSeek-Chat [13]	2:58	0%	Fast

Table 7

Software validation results – full-text screening result.

FILENAME	TITLE	AUTHOR	YEAR	CONFIDENCE	REASON FOR INCLUSION
1-s2.0-S0022510X21003166-main.pdf	Wearing-off symptoms during standard and extended natalizumab dosing intervals: Experiences from the COVID-19 pandemic [16]	Gerd Haga Bringeland	2021	0.95	The paper studies adults (population) receiving natalizumab (intervention) with outcomes related to multiple sclerosis (MS) regarding wearing-off symptoms during standard (SID) and extended (EID) dosing intervals. It meets all inclusion criteria for population, intervention, and outcomes.
1-s2.0-S221103482100612X-main.pdf	Safety of Natalizumab infusion in multiple sclerosis patients during active SARS-CoV-2 infection [17]	Landi D	2021	0.95	The paper focuses on natalizumab (intervention) in adults with multiple sclerosis (population) during active SARS-CoV-2 infection, which meets all inclusion criteria (population: adults; intervention: natalizumab, SID, EID; outcomes: MS). No exclusion criteria are violated.
1-s2.0-S1878747924000370-main.pdf	Commentary extended interval dosing of natalizumab: More evidence in support [18]	Karlo Toljan	2024	0.95	The paper evaluates extended interval dosing (EID) of natalizumab versus standard interval dosing (SID) in adults with multiple sclerosis (MS), assessing outcomes such as disease activity, relapse rates, and safety (including PML risk). It meets all PICO criteria: population (adults), intervention (natalizumab, SID/EID), comparison (SID vs EID), and outcomes (MS).

ReviewAid: An Open-Source Tool for Efficient PICO-Based Screening and Data Extraction in Systematic Reviews

Figures & Tables

Figure 1