Table 1
Summary of data centers and their data publication storage limitations, and resources for data contributors on best practices for curating data packages, modeling related and in general.
| PROVIDES DATA CONTRIBUTOR GUIDELINES | |||
|---|---|---|---|
| DATA CENTER | STORAGE LIMIT PER DATA PUBLICATION | MODEL-DATA SPECIFIC? | OTHER? |
| National Science Foundation Arctic Data Center | No limit | Yes | Yes |
| Oak Ridge National Laboratory DAAC | NA1 | Yes | Yes |
| NASA’s Earth Observing System Data and Information System (EOSDIS) | NA1 | NA1 | Yes |
| U.S. DOE ESS-DIVE | 10GB/500 GB2 | No | Yes |
| Dryad | 300 GB2 | No | Yes |
| Zenodo | 50 GB | No | No |
| Earth System Grid Federation (ESGF) | NA1 | NA1 | NA1 |
[i] 1 NA: Not available, i.e. no public information found.
2 Limit on size of individual files. For ESS-DIVE, 10GB is the default file size limit, and can be increased upto 500GB by request. Files >500GB are considered upon review.
Table 2
Summary of the standalone terrestrial models used by 12 researchers participating in this study. Coupled models (e.g., ELM-FATES and ELM-PLOTRAN) are not listed but were also considered in evaluating archiving needs.
| MODEL ACRONYM | MODEL NAME (ORGANIZATION) | REFERENCES | DESCRIPTION |
|---|---|---|---|
| ELM | Energy Exascale Earth System Model (E3SM) Land Model (DOE) | Golaz et al. (2019); https://e3sm.org/ | Land model component of the E3SM Earth System Model |
| FATES | Functionally Assembled Terrestrial Ecosystem Simulator (DOE) | Koven et al. (2020);https://github.com/NGEET/fates-release | Size and age-structured vegetation demographic model within a land surface model and can be coupled with an Earth system model |
| PFLOTRAN | Parallel Flow and Transport (DOE) | Hammond, Lichtner and Mills (2014); https://www.pflotran.org | Parallel reactive flow and transport model for subsurface hydrobiogeochemical processes |
| ATS | Advanced Terrestrial Simulator (DOE) | Coon et al. (2020); https://amanzi.github.io/ats/ | An integrated, distributed watershed hydrology model including surface and subsurface flow, energy transport, reactive transport, and ecohydrology. |
| CrunchFlow | N/A (DOE) | Steefel and Molins (2009) | Model for simulating multicomponent multi-dimensional reactive transport in porous media |
| MAAT | Multi-Assumption Architecture & Testbed (DOE) | Walker, Ye, et al. (2018); https://github.com/walkeranthonyp/MAAT | Modular terrestrial ecosystem process modeling framework for building multiple models that vary in process representation/hypotheses. |
| CLM | Community Land Model (NCAR) | Lawrence et al. (2019); https://www.cesm.ucar.edu/models/clm/ | Land model for the Community Earth System Model (CESM), a fully-coupled global climate model |
| ED2 | Ecosystem Demography Biosphere Model (NSF/NASA) | Longo et al., (2019); https://github.com/EDmodel/ED2 | Size- and age- structured terrestrial biosphere model |
| PRMS | Precipitation Runoff Modeling System (USGS) | Markstrom et al. (2015); https://www.usgs.gov/software/precipitation-runoff-modeling-system-prms | Deterministic process-based model developed to evaluate the impacts of climate and land use on streamflow and watershed hydrology. |
| SWAT | Soil and Water Assessment Tool (USDA/Texas A&M University) | Bieger et al. (2017); https://swat.tamu.edu/ | Watershed to river basin-scale model used to simulate the quality and quantity of surface and ground water and predict the environmental impact of land use, land management practices, and climate change. |
| LPJ-GUESS | Lund-Potsdam-Jena General Ecosystem Simulator (Lund University) | Smith, Prentice and Sykes (2001); https://web.nateko.lu.se/lpj-guess/ | Dynamic vegetation-terrestrial ecosystem model for regional or global studies |
| GDAY | Generic Decomposition and Yield | Comins and McMurtrie (1993);https://github.com/mdekauwe/GDAY | Stand-scale ecosystem model that simulates carbon, nitrogen, and water dynamics. |
| SDGVM | Sheffield Dynamic Global Vegetation Model (Sheffield University) | Woodward and Lomas (2004); https://bitbucket.org/walkeranthonyp/sdgvm/ | Terrestrial biosphere carbon cycle model for ecosystem to global scale simulations. Simple size and age structure. |
| OpenFOAM | N/A (OpenFOAM foundation) | https://openfoam.org/ | Computational fluid dynamics open source software |
| CALAND | California Natural and Working Lands Carbon and Greenhouse Gas Model (California Natural Resources Agency) | Di Vittorio and Simmonds (2019); https://doi.org/10.5281/zenodo.3256727. | Carbon stock and flux model that simulates the effects of various management practices, land use and land cover change, wildfire, and climate change on ecosystem carbon dynamics across all California lands |
Table 3
Estimates of archiving needs for typical spatial and temporal representations of simulation data from DOE terrestrial models, which are the most commonly-used models by the researchers in this study. Note that the same models are often run at different spatial extents (e.g., site to global) and temporal duration (e.g., weeks to centuries).
| DETAILS FOR TYPICAL SIMULATION1 TO BE ARCHIVED | ||||||||
|---|---|---|---|---|---|---|---|---|
| MODEL | SPATIAL RESOLUTION OR REPRESENTATION | SPATIAL EXTENT | TEMPORAL RESOLUTION2 | TEMPORAL DURATION | NO. OF FILES | MEAN FILE SIZE (GB) | TYPES OF FILE FORMATS | TOTAL ANNUAL STORAGE NEEDS (GB) |
| Multiple LSMs3 | Point4 | point | daily | 200 yrs | 300 | 0.1 | CSV | 50 |
| ELM | point | point | hourly, daily | 10 – 20 yrs | 20 | 0.004 | netCDF | 3 |
| ELM | 1/2° – 2° | global | monthly | 250 yrs | 2500 | 0.2 | netCDF | 15000 |
| ELM-FATES | point, ~1 km, ~1 degree | point, regional, and global modes | sub-daily, monthly | ~500 yrs | 1K – 10K | 50 | netCDF | 1000 |
| FATES | point | point | <hourly | 10 yrs | 70 | 3 | netCDF | 2000 |
| ELM-PFLOTRAN | 1 – 100 m | 100 m – 10 km | hourly/daily | 10+ yrs | 10 – 100 | 10 | HDF5, netCDF | 1000 |
| PFLOTRAN | <1 m | 5-6 km | <hourly | 30 yrs | 5 | 1000 | HDF5 | 10000 |
| ATS | 100 m – 250 m | 10 km | daily | 10 – 100 yrs | 20 | 100 | XML + HDF5, CSV | 1000 |
| ATS | <1 – 100 m | 10 m – 10 km | daily | 10 – 100 yrs | 2 | XML + HDF5 | 1000 | |
| ATS | 0.25 m | 25 m | daily | 100 yrs | 50 – 200 | XML + HDF5 | 10 | |
| CrunchFlow | <1 m | <1 km | <hourly | 30 days | 100 | 0.001 | TXT | 1 |
[i] 1 Note that “ensembles” of simulations were not considered in this survey, except in the total annual storage needs reported.
2 This could represent either the simulation temporal resolution, or output file temporal resolution.
3 Here we use Land Surface Model “LSMs” to include both standard CMIP-style Earth System Models (e.g. ELM) and more complex vegetation phenology models (e.g. FATES).
4 Note that “point” is used to indicate a single vertical column of cells or otherwise a single location in horizontal space.

Figure 1
Perspectives from a group of 12 U.S. Department of Energy terrestrial model researchers of (a) archiving different components of model data in a public repository (b) the period of time over which publicly archived model data remain useful, and (c) purposes served by archiving model data in a public repository. The importance ranking for (a) and (c) are shown as 1 (not important at all) to 5 (extremely important), and represent average importance scores across 12 researchers.

Figure 2
Decision tree for determining recommended approach for grouping model-related files for public archiving.
