Have a personal or library account? Click to login
Geoscientists’ Perspectives on Cyberinfrastructure Needs: A Collection of User Scenarios Cover

Geoscientists’ Perspectives on Cyberinfrastructure Needs: A Collection of User Scenarios

Open Access
|Jun 2019

Figures & Tables

Table 1

User Scenario Template. See Stocks et al. (2019) for an annotated user scenario template with additional usage tips and example responses.

Summary Information Section
      Use Case Name
      Contacts: Roles for the contacts are taken from the ISO 19115 CI_RoleCode vocabulary, and at a minimum should capture the interviewer (as Author) and interviewed scientist (as pointOfContact/originator).
      Link to Primary Documentation: a single reference that describes the scenario, if it exists. Additional related references are captured below.
      Permission to make public? (Yes/No); Permission granted by; Date permission granted
      Science Objectives and Outcomes: overview of the scientific goals and importance of the scenario.
      Overarching Science Driver: high-level scientific impetus. Can referencing agency priorities or strategic science plans documentation as relevant.
      Scenario Detail Section
      Actors: key people and/or systems involved in the project.
      Preconditions: preconditions, requirements, assumptions, and state changes necessary for the scenario to be executed.
      Critical Existing Cyberinfrastructure: existing data repositories, software, etc. needed.
      Measures of Success: the important outcome or product if the scenario workflow is completed.
      Basic Flow: steps to be followed in doing the user scenario. Often referred to as the primary scenario or course of events.
      Alternate Flow: any alternate workflows that might occur, e.g. to handle error conditions.
      Activity Diagram: a picture or flow chart that captures the major workflow steps, actors at each step, inputs and outputs at each step, and optional alternate paths.
      Major Outcome and Post Conditions: conditions that will be true of the state of the system after the scenario has been completed. Including what happens with data and other products after the project finishes.
      Problems/Challenges: any significant or disruptive problems or challenges that prevent or interfere with the successful completion of the activity. For each one, list the challenge and who/how it impacts; what, if any, efforts have been undertaken to fix these problem; recommendations for tackling this problem; how the larger community can address this problem.
      References: links to other relevant information such as background, clarifying and otherwise useful source material. Include web site links, project names, overall charters, additional points of contact, etc. This section is distinct from Primary Documentation, which just describes the particular scenario.
      Notes: Any additional important information.
      Technical Section
      Data Characteristics: describe all the data involved in the scenario, both existing and desired, as follows:
            Data Source(s)
            Data Format(s)
            Volume (size)
            Velocity (e.g. 2TB/day)
            Variety (e.g. sensor data, model output)
            Variability (e.g. differences in site density across studies)
            Veracity/Data Quality (accuracy, precision)
            Data Types (e.g. sequence data, core images)
      Standards: any standards that were followed for the cyberinfrastructure resources, even if already mentioned above. Standards can apply to data, models, metadata, etc.
      Data Visualization and Analytics: analysis and visualization capabilities needed for the scenario, whether existing or desired.
      Software: For any important software used, describe the important characteristics (source, language, input format, output format, CPU requirements, etc.).
      Metadata: Provide a link to, or include, any relevant metadata adding additional detail and context to the dataset(s) described above.
Table 2

Summary of cyberinfrastructure challenges expressed in the 49 use cases. The percents do not add up to the category totals because 1) one use case can express challenges in more than one subcategory; and 2) challenges expressed by three or fewer use cases were not listed, but were included in the category counts. Percentages are absolute not relative.

Data Challenges
78%Data Access/availability
28%Data not online
18%Data in multiple online sources
14%Hard to search for desired data in online source
12%Important relationships between data in multiple sources missing
8%Hard to find/access data in publications
8%Sharing data is difficult/lacks incentives
32%Data variety, diversity, and heterogeneity issues
24%Data format diversity
8%Semantic variability
12%Integrating different data types (discrete vs continuous, sensor vs 4D model, etc.)
Other
18%Total data volume
16%Needed data does not exist (e.g. not enough sensors, or gaps in the data)
14%Insufficient or uncertain data quality
14%Insufficient metadata
Non-Data Challenges
36%Software
30%Desired software does not exist
12%Desired software exists, but is not accessible/reusable
28%Best practices, protocols, standards, other guidance needed
26%Funding challenges, especially long-term sustainability for CI
16%Networking, Storage, CPU
12%Access to informatics/computer science expertise
dsj-18-931-g1.jpg
Figure 1

A word cloud created from the summary of cyberinfrastructure challenges extracted from each user scenario. The size of each word is approximately proportional to the number of uses of that word. Common words and numbers are not included.

Table 3

Standards mentioned in user scenarios. Note that this represents the scientists’ reporting on the standards they use, and items like ‘GPS’ are included even though a technologist would not consider them a standard.

Standard# of Use Cases
OGC5
DOI2
EML2
GPS2
iGSN2
NetCDF2
BagIt1
CDF1
CF metadata1
CUAHSI WFS1
DCAT1
Excel1
GCIS1
iPLANT1
IRIS1
ISO191151
Memex1
MIMS/MIGS1
NOAA1
Nutch1
SEAD1
SEASAS1
SensorML1
USDA1
UTF-81
VIVO1
WOCE1
Table 4

Data formats mentioned in user scenarios.

Format# of Use Cases
CSV14
NetCDF11
MATLAB .mat6
Excel6
txt5
ArcGIS/ESRI shapefiles4
jpeg3
tiff3
tsv2
SEED2
xls1
mzML1
mzXML1
geojson1
geotiff1
GIS1
grib1
HDF1
HTML1
IRIS1
JSON1
miniC1
MSAccess1
Pivotpilot1
png1
UTF-8 unicode1
Table 5

Software mentioned in user scenarios. Only those found in two or more use cases are listed. Note that the category ‘in-house’ is not a single software, but includes any mention of unnamed software/code developed by the group of the interviewee.

Format# of Use Cases
MATLAB17
In-house code10
Excel9
ArcGIS7
R5
Adobe Illustrator4
Python3
Google Earth Engine3
IRIS/DMC tools3
IDL2
NCAR Tool, NCL2
Fledermaus2
Mathematica2
ODV2
Paraview2
GDAL2
VLC2
Petrel2
SQL2
STRABO2
Language: English
Submitted on: Jan 19, 2019
Accepted on: May 23, 2019
Published on: Jun 18, 2019
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2019 Karen I. Stocks, Sam Schramski, Arika Virapongse, Lisa Kempler, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.