
Figure 1
metaScreener pipeline architecture. Plugins are grouped by functional role; arrows indicate bundle data flow between stages.
Table 1
Plugin inventory for metaScreener version 3. T = inference temperature. Plugin 03 produces a structured criteria_harmonized.csv file consumed by all four downstream filtering plugins.
| # | PLUGIN | FUNCTION | METHOD |
|---|---|---|---|
| 01 | Reference Markers (experimental) | Extracts visually-present reference markers (e.g., [1]) from images supplied as PDF or PNG; not designed for standard PRISMA flow diagrams | GPT-4o vision API |
| 02 | References-of-X AI | Resolves and enriches bibliographic references via federated API queries | OpenAlex, Crossref, Semantic Scholar |
| 03 | Criteria Parser | Converts free-text inclusion/exclusion criteria into a structured, machine-readable criteria file | Rule-based inference, optional LLM refinement |
| 04 | EH (Exclusion by Heuristic) | Removes records matching any exclusion criterion at title/abstract level | Deterministic keyword/regex |
| 05 | IH (Inclusion by Heuristic) | Retains only records matching at least one inclusion criterion at title/abstract level | Deterministic keyword/regex |
| 06 | EL (Exclusion by LLM) | Applies LLM-based eligibility adjudication against exclusion criteria over full record text | OpenAI-compatible endpoint, T=0.0 |
| 07 | IL (Inclusion by LLM) | Applies LLM-based eligibility adjudication against inclusion criteria over full record text | OpenAI-compatible endpoint, T=0.0 |

Algorithm 1
Criteria Parser.

Algorithm 2
EL: Exclusion by LLM.

Algorithm 3
IL: Inclusion by LLM.
Table 2
Sequential screening funnel for the demonstration use case (initial corpus ).
| STAGE | INPUT | SURVIVORS | EXCLUDED | PRIMARY EXCLUSION REASON |
|---|---|---|---|---|
| Initial corpus | 776 | 776 | — | 752 English, 14 French; years 1962–2025 |
| EH (Exclusion by Heuristic) | 776 | 651 | 125 | Conference proceedings (); non-English () |
| IH (Inclusion by Heuristic) | 651 | 85 | 566 | Publication year < 2018 (); non-English () |
| EL (Exclusion by LLM) | 85 | 85 | 0 | No records met exclusion criteria |
| IL (Inclusion by LLM) | 85 | 73 | 12 | Did not meet HMD VR inclusion criterion (IC-4) |
| Final review corpus | — | 73 | 703 | 90.6% reduction from initial corpus |

Figure 2
Sequential screening funnel for the demonstration use case. Excluded records are shown with exclusion counts at each stage transition.

Figure 3
metaScreener desktop interface, shown on the Criteria Parser plugin (Plugin 03). The left panel accepts free-text inclusion and exclusion criteria; the right panel displays the structured harmonized table, with each row’s pipeline-stage assignment (EH/IH/EL/IL) and matching operator determined by the rule-based inference engine described in Algorithm 1. The log panel at the bottom shows the harmonizer parsing eight criteria and applying optional LLM refinement.
Table 3
Human-versus-LLM agreement on the three LLM-adjudicated criteria from the demonstration corpus. Cohen’s is computed between the human aggregate decision and the LLM canonical decision; Fleiss’ is computed across the three raters on the 15-record overlap subset per stage. is the percent observed agreement (human vs. LLM) over the same N.
| STAGE | CRITERION | COHEN’S | FLEISS’ | N | |
|---|---|---|---|---|---|
| EL | EC-2 (spatial-navigation focus) | –0.05 | –0.13 | 83.5% | 85 |
| EL | EC-3 (rubber-hand-illusion focus) | 0.10 | –0.05 | 87.1% | 85 |
| IL | IC-1 (HMD VR/virtual simulation) | 0.28 | 0.26 | 56.0% | 84 |
