NOS-TLPlot: A Specialized Python Tool for Visualizing Newcastle–Ottawa Scale Risk-of-Bias Assessments

Vihaan Sahu

doi:10.5334/jors.635

Full Article

(1) Overview

Introduction

Systematic reviews and meta-analyses are cornerstone methodologies in evidence-based practice, providing comprehensive syntheses of existing research to inform clinical guidelines, health policy, and future scientific directions. A critical component of any rigorous systematic review is the assessment of the risk of bias (RoB) within individual studies, as the validity of the meta-analytic conclusions is inherently dependent on the quality of the constituent evidence. For non-randomized studies of interventions, such as cohort, case-control, and cross-sectional studies, which frequently form the body of evidence when randomized controlled trials are impractical or unethical, the Newcastle–Ottawa Scale (NOS) has emerged as a de facto standard for quality appraisal [1]. The NOS evaluates studies across three broad domains: selection of study groups, comparability of groups, and ascertainment of exposure or outcome using a star-based rating system where studies can earn up to nine stars. While this star system provides a structured approach to quality assessment, effectively communicating these assessments to a diverse audience, ranging from specialist researchers to clinicians and policymakers, presents a considerable challenge. Tabular presentations of star ratings can be dense and difficult to interpret, especially when dealing with a large number of studies or domains. Visual representations, therefore, play a crucial role in summarizing and conveying RoB information in an intuitive and accessible manner.

The need for clear and standardized RoB visualizations is underscored by reporting guidelines such as PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), which emphasize the transparent presentation of study quality assessments. Traffic-light plots, which use color coding (typically green for low risk, yellow for moderate risk, and red for high risk) to represent the RoB across different domains for each study, have become a popular and intuitive method for this purpose. However, the creation of such plots, particularly tailored to the specific structure and scoring of the NOS, often involves manual processes using general-purpose spreadsheet or graphic design software, or requires significant programming expertise with generic plotting libraries. This manual approach is not only time-consuming and prone to errors but also leads to inconsistencies in visual style and interpretation across different reviews and research groups. Furthermore, while traffic-light plots are effective, they represent only one way of visualizing RoB data; other plot types, such as radar charts for multi-domain comparison or heatmaps for pattern identification, can offer complementary insights but are seldom used due to the lack of readily available, specialized tools.

Existing software solutions for RoB visualization (ROBVIS) often focus on tools designed for randomized controlled trials, such as the Cochrane RoB 2 tool [2], for which dedicated visualization packages like ROBVIS exist [3]. While ROBVIS is an excellent tool for its intended purpose, it is not specifically designed to handle the unique star-based domain structure of the NOS, where some domains (like Comparability) can award up to two stars, while others award a maximum of one. This difference in scoring mechanics necessitates a specialized approach for accurate and meaningful visualization of NOS data. Generic data visualization libraries in languages like Python [4] [e.g., Matplotlib [5], Seaborn [6]] or R offer immense flexibility but require users to write custom code, creating a barrier for researchers without advanced programming skills.

This situation creates a clear need for a dedicated, user-friendly, and reproducible tool that can automatically transform NOS star ratings into a variety of publication-ready visualizations, thereby streamlining the workflow for systematic reviewers and enhancing the clarity and consistency of RoB reporting in meta-analyses involving non-randomized studies. This report details the development, implementation, and features of NOS-TLPlot, a novel Python-based tool conceived to meet this specific need.

Implementation and architecture

NOS-TLPlot was conceived and developed by the author to provide a comprehensive yet accessible solution for visualizing Newcastle–Ottawa Scale (NOS) risk-of-bias assessments. The primary goal was to create a tool that automates the conversion of raw NOS star ratings into a variety of insightful, publication-quality graphics, thereby reducing the manual effort and technical barriers often associated with this task. The implementation leverages a combination of established Python libraries for data manipulation and plotting, alongside a modern web framework for user interaction, ensuring both power and ease of use. The tool’s architecture is designed around modularity, allowing for future extensions and customizations, while its open-source nature promotes transparency, reproducibility, and community contribution. The design philosophy prioritizes the specific requirements of NOS data, ensuring that visualizations accurately reflect the scale’s domain structure and scoring conventions. This section details the core components, data handling mechanisms, visualization engine, and user interfaces of NOS-TLPlot.

The foundation of NOS-TLPlot is a Python package, with its core plotting logic encapsulated within the nos_tlplot.py script. This script handles data ingestion, processing, and the generation of all supported plot types. Key Python libraries utilized include Pandas [7] for efficient data manipulation and analysis of the input NOS data, typically provided in CSV or Excel format. NumPy [8] is employed for numerical operations, particularly in the context of preparing data for certain plot types like radar charts. The primary plotting capabilities are delivered by Matplotlib [5] and Seaborn [6], which offer a wide range of customizable plotting functions and are well-suited for generating high-resolution, publication-ready figures. Matplotlib’s GridSpec is used for managing complex plot layouts, and its table module aids in creating the tabular visualization option. The tool is designed to work with Python 3.12, and its dependencies are managed via a requirements.txt file, facilitating straightforward installation. The package is licensed under the permissive Apache License 2.0, encouraging widespread use and modification.

A critical aspect of NOS-TLPlot is its handling of input data and the conversion of NOS star ratings to categorical risk-of-bias levels. The tool expects input files (CSV or Excel) to contain specific columns corresponding to the nine NOS domains, along with a study identifier (e.g., ‘Author, Year’), the ‘Total Score’ (sum of stars, 0–9), and optionally, an ‘Overall RoB’ column. The required domain columns are: ‘Representativeness’ (0–1 star), ‘Non-exposed Selection’ (0–1 star), ‘Exposure Ascertainment’ (0–1 star), ‘Outcome Absent at Start’ (0–1 star), ‘Comparability (Age/Gender)’ (0–2 stars), ‘Comparability (Other)’ (0–2 stars), ‘Outcome Assessment’ (0–1 star), ‘Follow-up Length’ (0–1 star), and ‘Follow-up Adequacy’ (0–1 star). If the ‘Overall RoB’ column is not provided, NOS-TLPlot automatically calculates it based on the ‘Total Score’ using standard NOS interpretation thresholds: a total score of 7–9 stars is categorized as ‘Low RoB,’ 4–6 stars as ‘Moderate RoB,’ and 0–3 stars as ‘High RoB.’

This automated conversion ensures consistency in risk categorization across different analyses. The tool then maps these categorical risk levels (and sometimes the underlying star scores or their normalized equivalents) to colors and other visual encodings for the various plot types.

The visualization engine of NOS-TLPlot is designed to produce twelve distinct plot types, each offering a unique perspective on the NOS data. These are:

Star distribution plot: The main by default plot which follows the standard Traffic Light theme by default, thus distributing NOS star distributions as seen in Figure 1.
Figure 1
Star distribution plot.
Histograms showing the frequency of specific star ratings (0–5 stars) attained within each of the assessment domains.
Bubble plot: A classic Bubble plot version of the NOS star distributions in Figure 2.
Figure 2
Bubble plot.
Comprehensive bubble plot illustrating Newcastle-Ottawa Scale scores.
Radar chart: Displays a multi-axis plot where each axis represents a NOS domain. Domain scores are typically normalized to a common scale (e.g., 0–1) for this plot in Figure 3.
Figure 3
Radar chart.
Radar chart displaying normalized scores across NOS domains for each individual study.
Domain heatmap: Presents a color-coded matrix where rows are studies and columns are domains. The color intensity or hue corresponds to the RoB level or the raw star score, enabling quick identification of patterns and clusters of bias across studies and domains in Figure 4.
Figure 4
Domain heatmap.
Heatmap matrix representing the categorical risk of bias (Low, Moderate, High) for each domain across all included studies.
Dot profile: A compact visualization where each study’s performance across domains is shown by a series of dots, whose color or size may represent the RoB level or score in Figure 5.
Figure 5
Dot profile plot.
Profile plot depicting raw star ratings for the NOS domains for each study.
Donut domain chart: Illustrates the proportion of studies falling into each overall RoB category (Low, Moderate, High) as segments of a donut-shaped pie chart in Figure 6.
Figure 6
Donut domain chart.
Donut charts displaying the proportional distribution of risk levels (Low, Moderate, High) within each assessment domain.
Lollipop plot: Combines a vertical line (the ‘stick’) with a marker (the ‘candy’) at its end to represent each study’s total NOS score, often ordered by score. The color of the marker can indicate the overall RoB category as seen in Figure 7.
Figure 7
Lollipop plot.
Lollipop chart ranking studies by their total NOS score, color-coded by overall risk of bias.
Stacked area chart: Visualizes the cumulative distribution of RoB categories across domains, or can show the proportion of each RoB level over an ordered list of studies as seen in Figure 8.
Figure 8
Stacked area plot.
Stacked area chart illustrating the percentage composition of risk categories.
Pie chart: Similar to the donut chart, shows the proportion of studies in each overall RoB category in Figure 9.
Figure 9
Pie chart.
Pie chart summarizing the proportionate distribution of studies classified as Low, Moderate, and High overall risk of bias.
Line ordered plot: Connects the RoB levels (or scores) of domains for each study with lines, with studies typically ordered by their total NOS score. This can highlight trends or patterns in domain-specific biases as seen in Figure 10.
Figure 10
Line ordered plot.
Line graph tracking domain scores across studies ordered by increasing total NOS score to visualize quality trends.
Table view: Generates a color-coded table, directly presenting the NOS data with cells shaded according to RoB levels, offering a detailed, tabular summary in Figure 11.
Figure 11
Table view.
Tabular summary of individual domain scores, cumulative total scores, and overall risk of bias assessment for each study.
Radar (thematic) chart: A version of the radar chart that adheres to the selected theme (e.g., grayscale), ensuring consistency with other plot outputs when a specific color scheme is required as seen in Figure 12.
Figure 12
Radar (Thematic) chart.
Radar chart overlaying study performance across domains, with lines color-coded according to the overall risk of bias classification (Low, Moderate, High).

Each plot type is implemented as a separate function within nos_tlplot.py, allowing for individual customization and generation. The tool supports exporting figures in multiple formats, including PNG (default at 300 DPI for publication quality), PDF, SVG, and EPS, catering to various publication requirements. Customization options include selecting between a ‘traffic_light’ theme (default, using green, yellow, red) or a ‘gray’ theme for grayscale publications. Users can also adjust figure sizes, line thicknesses, and font sizes via parameters in the Python functions or through settings in the web interface.

To ensure broad accessibility, NOS-TLPlot provides two primary modes of interaction:

Command-line interface (CLI): For users comfortable with the terminal or for integrating the tool into automated scripts and reproducible research workflows, NOS-TLPlot can be run directly from the command line. To ensure a clean setup and avoid dependency conflicts, use of virtual environments (such as venv or pyenv) is recommended for managing the tool’s dependencies. A typical usage would be python3 nos_tlplot.py input.csv output_traffic-light.png traffic_light, which generates a traffic-light plot from input.csv and saves it as output_traffic-light.png using the traffic-light theme. This method is particularly useful for batch processing or when reproducibility via scripted commands is paramount.
Streamlit web application: To cater to users who prefer a graphical user interface (GUI) or have limited programming experience, NOS-TLPlot includes a web application built using Streamlit [9]. The app can be launched locally with streamlit run app.py after installing the dependencies, or accessed via a publicly hosted instance (e.g., at nos-tlplot.streamlit.app or via a Vercel link like nos-tlplot.vercel.app). The web app allows users to upload their CSV or Excel NOS data file, preview the data, select from the twelve available visualization types, choose the desired theme, and download the generated figure in their preferred format. This interactive approach makes the tool’s functionalities accessible to a much wider audience, promoting its adoption in diverse research settings.

Reproducibility and open-source principles are central to NOS-TLPlot’s design. The source code is hosted on GitHub (https://github.com/aurumz-rgb/NOS-TLPlot), allowing for public scrutiny, issue tracking, and community contributions. The project is archived on Zenodo and assigned a DOI (10.5281/zenodo.17065214) [10], ensuring that specific versions are citable and persistently accessible. A citation.cff file is also included in the repository to facilitate citation by software and reference managers. This commitment to open science practices enhances transparency and allows other researchers to verify, extend, or adapt the tool for their needs. The combination of a robust Python backend, a user-friendly web frontend, and a strong emphasis on open-source principles makes NOS-TLPlot a valuable addition to the meta-analyst’s toolkit.

Quality control

NOS-TLPlot has undergone rigorous testing to ensure the reliability, accuracy, and scalability of its visualization outputs. The validation process focused on verifying that the tool can handle real-world data volumes and formats typical of large-scale systematic reviews without compromising rendering quality or data integrity.

To assess software performance with independent data, Newcastle–Ottawa Scale assessments extracted from the systematic review by de Oliveira Almeida et al. [13], which evaluates the quality of studies on physical function in COVID-19 survivors, were utilized. The dataset was successfully visualized, confirming that NOS-TLPlot accurately parses external study data and correctly maps star ratings to risk-of-bias categories. The resulting visualization outputs are documented in Supplementary Figure 1 and Supplementary Figure 2.

In addition to this external validation, the software was stress-tested using a dataset comprising approximately 100 studies extracted from various published systematic reviews. All twelve visualization types were generated correctly and maintained structural clarity even with a high density of data points. These results demonstrate the tool’s robustness and suitability for comprehensive meta-analyses involving numerous studies, as shown in Supplementary Figure 3 (Lollipop Plot) and Supplementary Figure 4 (Heatmap).

Finally, the internal logic of the risk-of-bias calculations was verified to ensure correct application of categorization thresholds (e.g., 7–9 stars as ‘Low RoB’).

The sample datasets are provided in the repository [10], and users can run the tool with these datasets to verify its functionality. These sample files, sample.csv and sample.xlsx, are located at https://github.com/aurumz-rgb/NOS-TLPlot/blob/main/sample.xlsx and https://github.com/aurumz-rgb/NOS-TLPlot/blob/main/sample.csv respectively.

To test the tool using the CLI:

Ensure the tool and its dependencies are installed.
Download a sample dataset (e.g., sample.csv).
Open a terminal or command prompt.
Run the command: python3 nos_tlplot.py path/to/sample.csv path/to/output_plot.png traffic_light (replacing path/to/sample.csv and path/to/output_plot.png with actual paths). For example: python3 nos_tlplot.py sample.csv output_traffic-light.png traffic_light This command generates a traffic-light plot from sample.csv and saves it as output_traffic-light.png in the current directory. Users can then inspect the generated output_traffic-light.png to confirm it matches the expected visualization.

To test the tool using the web application:

Ensure the tool and its dependencies (including Streamlit) are installed.
Open a terminal or command prompt.
Navigate to the directory containing app.py.
Run the command: streamlit run app.py
This will typically open the NOS-TLPlot web application in a new browser tab.
In the web interface, use the ‘Browse files’ button to upload one of the provided sample datasets (e.g., sample.csv or sample.xlsx).
A preview of the uploaded data should appear.
Select a visualization type from the ‘Select Plot Type’ dropdown menu.
Choose a theme (e.g., ‘traffic_light’ or ‘gray’).
(Optional) Adjust figure size parameters.
Click the ‘Generate Plot’ button.
The generated plot should appear in the web interface. Users can then download the plot using the ‘Download Plot’ button and inspect it.

These steps allow users to quickly verify that the software is working as expected and that all twelve visualization types can be generated successfully from the provided sample data. The correct rendering of plots, adherence to selected themes, and accurate reflection of the input data’s risk-of-bias assessments in the visual outputs serve as primary quality checks.

(2) Availability

Operating system

Cross-platform (Windows, macOS, Linux)

Programming language

Python 3.12

Additional system requirements

Memory: 2GB RAM recommended.
Disk Space: 500MB for dependencies.

Dependencies

Pandas (> = 2.2.3)
NumPy (> = 2.2.2)
Matplotlib (> = 3.9.2)
Seaborn (> = 0.13.2)
Streamlit (> = 1.49.1)
Pyarrow (> = 21.0.0)
Openpyxl (> = 3.1.5)

Specific versions are managed in the requirements.txt file within the repository.

List of contributors

Vihaan Sahu (Georgian National University SEU, Tbilisi, Georgia) – Sole developer and contributor.

Software location

Archive

Name: Zenodo
Persistent identifier: 10.5281/zenodo.17065214 [10]
Licence: Apache License 2.0
Publisher: Vihaan Sahu
Version published: 2.0.2
Date published: 24/10/2025 (as per Zenodo record, actual date may vary slightly upon publication).

Code repository

Name: GitHub
Identifier: https://github.com/aurumz-rgb/NOS-TLPlot
Licence: Apache License 2.0
Date published: 05/9/2025
Emulation environment (if appropriate): Not applicable for this software.

Language

English (repository, software comments, documentation, and supporting files).

(3) Reuse Potential

NOS-TLPlot is designed to be directly reusable by researchers, clinicians, and policymakers involved in conducting, interpreting, or utilizing systematic reviews and meta-analyses that include non-randomized studies assessed using the Newcastle–Ottawa Scale. Its primary use case is the generation of clear, standardized, and publication-quality visualizations for risk-of-bias assessments, thereby enhancing the transparency and communicative power of evidence syntheses.

Use cases

Systematic reviewers and meta-analysts: The core user group. NOS-TLPlot streamlines the process of creating RoB figures, saving significant time and effort compared to manual methods or custom scripting. It allows for the easy generation of multiple plot types to best represent the data and answer specific research questions. The reproducibility features (CLI, version control) are crucial for transparent reporting.
Clinicians and policymakers: As consumers of systematic review findings, this group benefits from the clear and intuitive visual outputs (e.g., traffic-light plots, donut charts) that quickly convey the quality of the underlying evidence, aiding in informed decision-making.
Educators and students: NOS-TLPlot can serve as a teaching tool to illustrate concepts of study quality, risk of bias, and data visualization in courses on research methodology, epidemiology, and evidence-based practice. The user-friendly web app lowers the barrier for students to explore RoB data visually.
Journal editors and peer reviewers: The tool can help standardize RoB reporting in manuscripts, making it easier for editors and reviewers to assess and compare the quality of studies included in submitted reviews.
Modifications and extensions: NOS-TLPlot is built with modularity in mind, facilitating potential extensions:
Support for Other RoB tools: While currently specialized for NOS, the underlying architecture (data ingestion, RoB categorization, plotting framework) could be adapted to support other risk-of-bias assessment tools commonly used for non-randomized studies, such as ROBINS-I [11] or for diagnostic accuracy studies like QUADAS-2 [12]. This would involve defining new domain mappings and potentially new plot logic or adjustments to existing ones.
Enhanced interactivity: The Streamlit web app could be further enriched by integrating more interactive plotting libraries like Plotly. This would allow for features such as tooltips on hover, dynamic filtering of studies or domains, and zooming capabilities, providing a more exploratory data analysis experience.
Integration with meta-analysis workflows: Future development could focus on creating connectors or wrappers for popular meta-analysis packages in R (e.g., metafor) or Python, allowing for seamless data transfer between RoB assessment, visualization, and meta-analytic synthesis steps.
Expanded visualization library: New, specialized plot types could be added based on user feedback and evolving best practices in RoB visualization.
Customizable RoB thresholds: While NOS-TLPlot uses standard thresholds for converting total star scores to RoB categories, allowing users to define custom thresholds could cater to specific disciplinary needs or variations in NOS interpretation.

Contributors interested in modifying or extending NOS-TLPlot are encouraged to fork the GitHub repository, make their changes, and submit pull requests. They can also open issues on GitHub to discuss potential enhancements or report bugs. The corresponding author, Vihaan Sahu (vsahu@seu.edu.ge), can also be contacted for inquiries about collaboration or significant extensions.

Support mechanisms

Community support via GitHub: The primary support channel is the GitHub repository (https://github.com/aurumz-rgb/NOS-TLPlot). Users can:

Report bugs or unexpected behavior by opening a new ‘Issue.’
Suggest new features or enhancements by opening a new ‘Issue.’
Ask questions or seek clarification on usage by opening a new ‘Issue’ or participating in existing discussions.
Contribute code or documentation improvements via ‘Pull Requests.’

Documentation: The repository includes (or aims to include) comprehensive documentation, including a README file with installation instructions, usage examples for both CLI and web app, and descriptions of input data formats and available plot types. The sample datasets also serve as practical examples.

Troubleshooting and common issues

To ensure a smooth user experience, users may refer to the following common issues and solutions:

Input format errors: The tool requires specific column headers to function correctly. If the software raises a KeyError or generates empty plots, verify that the CSV or Excel file headers exactly match the required NOS domain names (e.g., ensure ‘Comparability (Age/Gender)’ is used instead of ‘comparability (Age/Gender)’ or ‘Age/Gender’). Headers are case-sensitive.
Dependency conflicts: If users encounter ModuleNotFoundError or version conflict errors, it is recommended to install the tool within a clean virtual environment (e.g., Python venv or conda). This isolates NOS-TLPlot’s dependencies from system-wide packages.
Web app port conflicts: By default, the Streamlit web application attempts to run on port 8501. If this port is already in use, the app will fail to launch. Users can specify a different port using the command: streamlit run app.py --server.port 8502(substituting 8502 with any available port).

Author contact: For specific inquiries not suitable for the public GitHub forum or for collaboration discussions, users can contact the corresponding author, Vihaan Sahu, at vsahu@seu.edu.ge. While direct email support cannot be guaranteed indefinitely, the author will endeavor to assist with significant issues or collaboration proposals.

Version control and archiving: The use of Git for version control on GitHub and archiving on Zenodo with a DOI ensures that stable versions of the software are persistently available and citable, which is a form of long-term support for reproducibility.

By providing a dedicated, user-friendly, and reproducible solution for NOS data visualization, NOS-TLPlot has significant reuse potential to enhance the quality, transparency, efficiency of reporting, and overall impact of systematic reviews and meta-analyses across various fields of research that rely on non-randomized evidence.

Additional Files

The additional files for this article can be found as follows:

Supplementary Figure 1

Bubble Chart Plot (Traffic Light theme) visualizing the Newcastle–Ottawa Scale assessments extracted from the systematic review by de Oliveira Almeida et al. [13]. DOI: https://doi.org/10.5334/jors.635.s1

Supplementary Figure 2

Domain Donut Chart displaying normalized scores across NOS domains for the studies included in the systematic review by de Oliveira Almeida et al. [13]. DOI: https://doi.org/10.5334/jors.635.s2

Supplementary Figure 3

Lollipop Plot ranking studies by total NOS score, generated using a stress-test dataset of approximately 100 studies. DOI: https://doi.org/10.5334/jors.635.s3

Supplementary Figure 4

Domain Heatmap representing the categorical risk of bias (Low, Moderate, High) for each domain across a stress-test dataset of approximately 100 studies. DOI: https://doi.org/10.5334/jors.635.s4

Supplementary Figure 5

Bubble chart showing the categorical risk of bias (Low, Moderate, High) for each domain across a stress-test dataset of about 100 studies. DOI: https://doi.org/10.5334/jors.635.s5

Acknowledgements

The author acknowledges the contributions of the open-source community, with particular thanks to contributors to the Python [4], Streamlit [9], Matplotlib [5], Seaborn [6], Pandas [7] and NumPy [8] projects. The development of NOS-TLPlot was an independent effort without institutional support from Georgian National University SEU.

Competing Interests

The author has no competing interests to declare.

NOS-TLPlot: A Specialized Python Tool for Visualizing Newcastle–Ottawa Scale Risk-of-Bias Assessments

Full Article

(1) Overview

Introduction

Implementation and architecture

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12