(1) Overview
Introduction
The Sun, our closest star, is a powerhouse of energy, driving the complexities of the heliosphere and influencing Earth’s space environment. It is a G-type main-sequence star composed mainly of hydrogen, which undergoes nuclear fusion in its core to form helium and release vast amounts of energy [1, 2, 3]. This energy supports life on Earth, affecting both natural and man-made systems. The Sun’s internal structure facilitates a variety of solar phenomena including solar flares (SF), coronal mass ejections (CMEs), solar energetic particles (SEPs), solar prominences, and sunspots [4, 5]. These events play a crucial role in shaping our understanding of solar physics and have significant implications for space weather impacting satellite operations, communication systems, and even power grids on Earth due to geomagnetic storms [6, 7].
The advent of big data in heliophysics presents both opportunities and challenges [17]. The sheer volume of data generated by observations of the Sun and its associated phenomena requires scalable solutions for storage, processing, and analysis [8]. Traditional methods, while foundational, often fall short in addressing the end-to-end data, computational, and infrastructure complexity. In response, advancements in artificial intelligence (AI) and machine learning (ML) offer promising avenues for enhancing predictive models and simulations, improving the accuracy of space weather forecasts [9]. However, AI/ML algorithms often require significant computational resources, especially when dealing with large datasets or models [10]. Hence, cloud computing has emerged as an effective tool. Platforms like Amazon Web Services (AWS) have become instrumental in facilitating data analysis and sharing within the scientific community offering services ranging from computational power to AI/ML capabilities [11, 12].
Scope and Purpose of Helio-Lite
Helio-Lite is a lightweight, open-source platform for reproducible solar data analysis that can be deployed without advanced cloud computing expertise. Installation follows step-by-step instructions, avoiding the need for Docker or Kubernetes [13]. It provides customized Python kernels for heliophysics and AI/ML, a Jupyter notebook examples repository, and web-based extraction modules for retrieving images from the Atmospheric Imaging Assembly (AIA) and the Helioseismic and Magnetic Imager (HMI) through the Joint Science Operations Center (JSOC). AIA captures full-disk solar images in multiple extreme ultraviolet wavelengths; HMI provides high-resolution measurements of the Sun’s magnetic fields and Doppler velocities; and both datasets are archived at JSOC, which offers calibrated products and processing tools [14]. Space weather event data are sourced from NASA’s Space Weather Database Of Notifications, Knowledge, and Information (DONKI), which aggregates near real-time solar event information in standardized JSON format [15, 16]. Helio-Lite also integrates heliophysics event catalogs from the Georgia State University (GSU) Solar Informatics and Data Mining Lab (DMLab), whose interdisciplinary research and advanced data mining techniques strengthen the platform’s capabilities [17, 18, 19]. The heliophysics kernel builds on open-source libraries from the Python Heliophysics Community (PyHC) which develops and promotes standardized analysis packages for space science [20, 21].
Intended Users
Helio-Lite supports heliophysics researchers, space weather analysts, citizen scientists, educators, students, and AI/ML developers. Researchers can efficiently analyze large datasets and conduct experiments on solar phenomena. Citizen scientists can explore solar data and contribute to research with minimal technical barriers. Educators and students can integrate Helio-Lite into curricula to provide hands-on experience in solar data analysis. Space weather forecasters can use its predictive modeling capabilities to improve forecasts, and AI/ML developers can test and refine algorithms tailored for heliophysics data.
Comparison with HelioCloud
HelioCloud is an open-source, NASA funded project that provides cloud-based access to petabyte-scale datasets, high-performance computing resources, and an extensible PyHC based software stack [22, 23]. It is optimized for large, collaborative projects requiring persistent cloud storage, centralized management, and integration with multiple mission archives. In contrast, the lightweight Helio-Lite platform runs on a single AWS instance, requires no centralized backend, and gives users full control over a self contained, reproducible JupyterHub workspace. This design makes it well suited for independent researchers, educators, and small teams who need a deployable, modular environment rather than a large institutional system. The key architectural and operational differences between Helio-Lite and HelioCloud are summarized in Table 1.
Table 1
Comparison of Helio-Lite and HelioCloud features.
| FEATURE | HELIO-LITE | HELIOCLOUD |
|---|---|---|
| Deployment Model | Single AWS instance; user-managed JupyterHub | Institutional scale cloud platform managed centrally |
| Target Users | Individual researchers, educators, small groups | Multi-institutional research collaborations |
| Storage Model | Local EBS or optional S3; no persistent backend required | Centralized object storage and mission archives |
| Data Scope | User-specified datasets and Application Program Interfaces (API) (e.g., JSOC, DONKI, DMLab) | Integrated petabyte-scale mission datasets |
| Environment Configuration | Two prebuilt Conda environments (AI/ML, PyHC) | Pre-integrated PyHC software stack with additional HPC modules |
| Use Case Emphasis | Reproducible research, education, and prototyping | Long-term data hosting, HPC workloads, and collaborative analysis |
Helio-Lite complements rather than competes with HelioCloud. Researchers can use Helio-Lite to test workflows or prototype small scale analyses before porting mature pipelines to HelioCloud’s distributed infrastructure.
Technical Architecture and AWS Infrastructure
Helio-Lite runs entirely on AWS, enabling users to leverage cloud-based computation and storage without requiring specialized local hardware. The system uses scalable Amazon Elastic Compute Cloud (EC2) instances within a secure Virtual Private Cloud (VPC) configured with security groups and network Access Control Lists (ACLs). By default, user data and results are stored locally on the instance’s Elastic Block Store (EBS) volume; however, integration with Amazon Simple Storage Service (S3) is fully optional and supported for users who wish to enable persistent or shared storage. S3 provides durable cloud storage and can be configured to retrieve results through the platform’s web interface. The overall system workflow is illustrated in Figure 1, showing how AIA and HMI images, DONKI data, and GSU DMLab datasets are accessed, processed, and optionally persisted within the AWS cloud environment.

Figure 1
Overall AWS data migration workflow for Helio-Lite, showing how AIA, HMI, DONKI, and DMLab datasets are accessed and moved into the cloud environment. Users connect through JupyterHub to analyze these datasets without local hardware dependencies.
Deployment and Setup
The following steps outline launching an EC2 instance, connecting to it, and configuring Helio-Lite.
Create an AWS Account
Sign up at the AWS website,1 providing contact and payment details. Select the preferred AWS region for hosting.
Launch an EC2 Instance
In the EC2 console, select Launch Instance.
Choose Ubuntu as the OS and c5.4xlarge (or larger) as the instance type.
Create a new key pair for SSH access and download the .pem file.
VPC and Subnet should be set to default, though custom configurations are supported.2
Configure network settings to allow inbound SSH, HTTP, and HTTPS.
Allocate at least 500 GiB of EBS storage.
In the User data field, paste the setup script provided in the repository’s documentation.3
Assign an Elastic IP
Allocate and associate an Elastic IP with the instance for consistent access.
Access the Server
Open a web browser and navigate to:
Example: http://ec2-12-34-56-78.compute-1.amazonaws.com
Your homepage should appear similar to the screenshot shown in Figure 2. Important: The server is not fully secure. Use HTTP until HTTPS is configured. For guidance on enabling HTTPS and configuring IAM policies, see the AWS documentation.4
Initialize Conda and Environment Directories
Run the initialization commands as shown in the repository’s setup instructions5 to create directories and install dependencies for both the AI/ML and PyHC environments.
Create Jupyter Kernels
Follow the documented procedure provided in the kernel_creation/ directory6 to register the two shared Conda environments (AI/ML and PyHC) as Jupyter kernels.
Verify Dependencies
Confirm successful installation of required libraries listed in the libraries_dependencies/ directory.7
This process results in a fully operational Helio-Lite environment, ready for collaborative heliophysics research. All setup commands, kernel creation scripts, and dependency files are available in the project’s GitHub repository8 and demonstrated in a video tutorial.9

Figure 2
Helio-Lite login page as seen after deployment.
Data Integration
Automated connectors, imagery from JSOC and event metadata from DONKI are retrieved, standardized, and made immediately available within the heliophysics (PyHC) kernel. Users can interactively browse, filter, and select relevant data, then launch on-demand visualizations or ML workflows without leaving the notebook interface. This tightly coupled integration streamlines the entire research process while ensuring reproducibility across real-time monitoring and historical studies.
All core data ingestion and preprocessing tasks are handled through custom Python modules authored specifically for Helio-Lite, including:
aiaImages.py: Handles AIA imagery in multiple wavelengths.
hmiImages.py: Handles HMI magnetogram and continuum image processing.
donkiData.py: Retrieves space weather events from DONKI.
dmLab.py: Facilitates internal data manipulation and formatting.
Supported Formats and Retrieval Methods
Helio-Lite retrieves and processes solar image and event data using standard web-accessible formats and APIs. The platform focuses on lightweight formats and metadata suitable for rapid prototyping, exploratory data analysis, and educational use instead of requiring users to download and parse full resolution Flexible Image Transport System (FITS) files. By working with standardized web APIs and common image formats, Helio-Lite simplifies data access and avoids the storage and processing overhead often associated with full resolution Level 1 FITS files. Supported formats and methods include:
Image: JPEG 2000 (.jp2) and JPEG (.jpg) files are retrieved from authoritative archives such as JSOC and from web-based visualization services such as Helioviewer through standardized APIs, then converted to .png for consistent in-browser rendering and lightweight machine learning workflows.
Data: JSON is used for structured metadata from DONKI and custom DMLab endpoints, while CSV exports provide optional tabular summaries of selected event metadata for downstream use.
Applications in AI/ML
Helio-Lite supports AI/ML workflows through dedicated, preconfigured computational environments built for both ML and heliophysics applications. These pre-configured Jupyter kernels are deployed within a centralized JupyterHub server: the AI/ML kernel, optimized for model development, data handling, visualization, and solar image processing; and the PyHC kernel, tailored for heliophysics research and interoperability with domain-specific analysis tools. Both kernels are registered in the JupyterHub interface, include custom modules for solar event analysis, and are fully reproducible through pinned dependencies. A complete, regularly updated list of installed packages is maintained in the project’s GitHub repository.
Example Workflows
Helio-Lite offers the following example workflows:
Solar Image and Event Analysis
AIA_DONKI.ipynb and HMI_DONKI.ipynb – Query, visualize, and download AIA or HMI images from JSOC; analyze related DONKI events.
DMLab.ipynb and GSEP_DMLab.ipynb – Load ML-ready datasets from DMLab’s public S3 bucket and work with SEP events from the Integrated Geostationary SEP Events Catalog.
TimeConversion.ipynb, TimeZones.ipynb – Convert between UTC, mission formats, and local time zones.
AI/ML Examples
pytorch_torchvision_RNN_PT.ipynb – Build a simple RNN in PyTorch.
tensorflow_keras_classification.ipynb – Classify clothing images using TensorFlow/Keras.
xgboost.ipynb – Binary classification on the Pima Indians Diabetes dataset.
seaborn.ipynb – Explore datasets and plots available in Seaborn.
PyHC Domain Tools
coordinate_systems.ipynb – Pass coordinates between SpacePy, Astropy, and SunPy.
coordinates_demo.ipynb – Use Astropy coordinate framework and SunPy’s solar extensions.
planet_locations.ipynb – Plot planetary positions for event context.
pyspedas_demo.ipynb – Demonstrate pySPEDAS capabilities.
pytplot_demo.ipynb – Demonstrate PyTplot capabilities.
units_demo.ipynb – Use Astropy units and dimensional consistency checks.
Future Work
In its current release, Helio-Lite ships with a default JupyterHub template and HTTP only access for ease of evaluation. The next major upgrade, Helio-Lite 2.0, will transition the system from a manually deployed prototype to a secure, fully automated, reproducible research environment. The upgrade will introduce a pre-configured Amazon Machine Image (AMI) and AWS CloudFormation template that enable one-click deployment through the AWS Marketplace. Security will be strengthened through HTTPS delivery using CloudFront and AWS Certificate Manager and dynamic administrator credential creation at first boot will eliminate default password risks. Reproducibility will be ensured by embedding version locked AI/ML and PyHC environments directly within the AMI, guaranteeing that every installation produces identical software states. Additional enhancements will include integrated domain naming through Route 53, optional CloudFront caching, comprehensive IAM-based security guidance, and long-term sustainability through versioned AMI releases. Beyond heliophysics, Helio-Lite’s modular architecture can readily support other scientific and educational applications that require browser-based, reproducible computing environments. The same JupyterHub framework and automated deployment workflow can be adapted for astronomy, planetary science, Earth observation, or atmospheric research where large datasets and domain specific python environments are accessed through public APIs. In education, the lightweight design offers a low-barrier platform for teaching data analysis and ML concepts using authentic research datasets.
(2) Availability
Operating system
Helio-Lite is deployed on an Ubuntu/Linux Amazon EC2 instance. Users interact with the environment through a JupyterLab interface.
Programming language
Automation is handled with Bash scripts, while data processing, scientific analysis, and ML workflows are tested in Python 3.7–3.12. Interactive visualization of solar images within Jupyter notebooks is achieved using ipywidgets combined with dynamic HTML (DHTML) and CSS, enabling responsive, browser-native displays directly alongside analysis code.
Cloud Usage and Costs
Helio-Lite is not a hosted service. All AWS usage and associated costs are the responsibility of the user. The toolkit does not collect usage data, manage billing, or act as an intermediary for cloud access. Users are encouraged to monitor instance up time, data transfer, and storage through the AWS Management Console.
License and Terms
Users must agree to the standard AWS service terms.10 Helio-Lite is released under the MIT License, with no restrictions beyond those stated in the license file.
Software Location
Code repository
Name: GitHub
Identifier: https://github.com/indiajacksonphd/Helio-Lite
License: MIT
Date repository created: 2024-01-14
Language: English
Archive
Name: Zenodo
Persistent identifier: https://doi.org/10.5281/zenodo.17611741
Version archived: v0.1.0
Date archived: 2025-11-14
(3) Reuse Potential
Helio-Lite is designed to serve heliophysics community but its modular architecture supports adaptation across scientific domains. Users can create the system and insert their own Jupyter kernels, environments for their own purposes.
Quality Control and Validation
Helio-Lite was verified through manual testing, direct deployment, and community demonstration rather than automated test suites. The platform was launched repeatedly in clean AWS environments to ensure that installation instructions, dependencies, and startup procedures produce consistent results across sessions and instance types. Quality control measures include:
Environment verification: All required packages and dependencies are specified in two reproducible Conda environments. These files define exact versions of key libraries to ensure consistent software environments across deployments. Each environment was successfully created and activated on new Ubuntu 22.04 LTS instances prior to running the system.
Deployment validation: Helio-Lite was manually deployed multiple times using the documented installation commands on AWS EC2 (t3, c5, and g4dn instance families). Each deployment reproduced the same behavior: Jupyter-Hub launches, kernels initialize, and included example notebooks open and run interactively without errors. This confirmed that the documented setup steps are sufficient to achieve a functional system.
Demonstration and user verification: A full video tutorial demonstrates the deployment and expected runtime behavior of Helio-Lite, confirming the reproducibility of the setup process and the correctness of the configuration. The video is publicly available on YouTube.11 Users can visually confirm that the system initializes successfully, loads the example environments, and operates as described in the manuscript and repository instructions.
Release reproducibility: Each stable release (e.g., v1.0.0) is archived on Zenodo with a DOI to ensure persistent, reproducible access to the exact version tested and demonstrated in the tutorial.
Python Environments and Dependencies
All dependencies are version-controlled to ensure reproducibility and minimize configuration drift. Kernel registration is automated. Four primary configuration files are hosted in the GitHub repository:
ml.yml — Defines the AI/ML environment, including PyTorch, TensorFlow, XGBoost, etc. and supporting libraries for model training and visualization.
environment.yml — Defines the PyHC environment, including SunPy, HelioPy, SpacePy, and other heliophysics packages maintained by the Python Heliophysics Community.
requirements.txt — Core Python packages.
custom_requirements.txt — Additional packages per Jupyter kernel.
Although Helio-Lite and related HelioSuite tools are not distributed as Python packages and therefore fall outside the current scope of formal inclusion in the PyHC package registry, the platform is designed to interoperate with commonly used PyHC libraries and workflows. Helio-Lite is incorporated into NASA’s Heliophysics Software Search Interface (HSSI),12 which indexes heliophysics software resources, including standalone applications that complement PyHC packages. In addition, Helio-Lite metadata are being integrated into the Space Physics Archive Search and Extract (SPASE) repository to support long-term discoverability, interoperability, and archival sustainability across heliophysics cyberinfrastructure systems.
Software Artifacts
Original contributions in Helio-Lite include:
Bash scripts for AWS EC2 environment setup and JupyterHub configuration.
Jupyter notebooks for solar data ingestion, visualization, and analysis.
Pre-configured Conda environment configuration files for both heliophysics and AI/ML workflows.
ML pipelines tailored for solar physics through example notebooks.
Integration of a customizable user interface for interactive solar image display within Jupyter notebooks.
The platform reuses established open-source tools including:
Project Jupyter and Python
SunPy, HelioPy, and PyHC packages for heliophysics data handling.
PyTplot and PySPEDAS for visualization.
AWS CLI and Boto3 for cloud automation.
Community and Contributions
The GitHub repository is open for issues, forks, and pull requests. Contributions are encouraged through clear documentation and modular structure. Users may submit new notebooks, extend data ingestion scripts, or propose additional mission support.
Limitations and User Onboarding
Helio-Lite assumes users have basic familiarity with Jupyter notebooks and web-based Python environments. While prior experience with AWS is helpful, it is not required. The repository provides detailed documentation, bootstrap scripts, and example datasets to lower the barrier for first-time users.
To simplify deployment, Helio-Lite includes a single automated onboarding process that installs and configures all required components without manual setup. Users paste the provided script from the repository’s START_HERE directory13 into the AWS User Data field when launching an EC2 instance. Within approximately 3–15 minutes, the JupyterHub interface becomes accessible through:
An administrator account (admin1) for JupyterHub is created automatically during installation. The administrator should change the default password immediately after first login. Additional users can self-register directly from the login page to access the shared JupyterHub workspace.
The current release operates over HTTP for simplicity; HTTPS configuration and IAM-based authentication will be introduced in a future update. A full video tutorial demonstrates the onboarding workflow, system initialization, and example notebook execution.14
Notes
[16] https://heliocloud.org/.
[18] https://heliopython.org/.
Acknowledgements
We extend our gratitude to the open-source community and contributors who have played a role in the development and maintenance Helio-Lite and resources utilized in this research.
HelioCloud: We acknowledge the HelioCloud community for their dedication to advancing heliophysics research. Their contributions, both on GitHub15 and through their website,16 have been instrumental in shaping the architectural foundation of Helio-Lite.
PyHC (Python in Heliophysics Community): We are grateful to the PyHC for their commitment to the standardization and improvement of Python tools in heliophysics data analysis. Their open-source initiatives, available on GitHub17 and their website,18 have been invaluable in our work, encompassing numerous pip packages and heliophysics notebook examples.
JSOC (Joint Science Operations Center): We acknowledge the JSOC team for their critical role in managing and providing access to high-resolution solar images through the Atmospheric Imaging Assembly (AIA) images API.19 Their dedication to making solar data accessible has significantly enriched our research.
DONKI (Database of Notifications, Knowledge, and Information): We recognize the Community Coordinated Modeling Center (CCMC) at NASA Goddard Space Flight Center for the DONKI user interface20 and API.21
GSU DMLab (Georgia State University Data Mining Lab): We express our appreciation to the Data Mining Lab at Georgia State University for their contributions in providing customizable datasets and their interdisciplinary research in the field of data mining.22
We would also like to acknowledge the following individuals and organizations for their open-source contributions and Jupyter notebook examples:
Competing Interests
The authors have no competing interests to declare.
Author Contributions
This work forms part of Dr. Jackson’s PhD dissertation. Dr. Jackson conceptualized, designed, and implemented the Helio-Lite platform, conducted all analyses, and wrote the manuscript. Dr. Martens and Dr. Aydin reviewed the manuscript, tested the software, and provided feedback on functionality and clarity. Dr. Martens also served as Dr. Jackson’s graduate advisor and Dr. Aydin as a graduate committee member. All authors read and approved the final manuscript.
