Geoscientists’ Perspectives on Cyberinfrastructure Needs: A Collection of User Scenarios

Karen I. Stocks; Sam Schramski; Arika Virapongse; Lisa Kempler

doi:10.5334/dsj-2019-021

Figures & Tables

Table 1

User Scenario Template. See Stocks et al. (2019) for an annotated user scenario template with additional usage tips and example responses.

Summary Information Section
Use Case Name
Contacts: Roles for the contacts are taken from the ISO 19115 CI_RoleCode vocabulary, and at a minimum should capture the interviewer (as Author) and interviewed scientist (as pointOfContact/originator).
Link to Primary Documentation: a single reference that describes the scenario, if it exists. Additional related references are captured below.
Permission to make public? (Yes/No); Permission granted by; Date permission granted
Science Objectives and Outcomes: overview of the scientific goals and importance of the scenario.
Overarching Science Driver: high-level scientific impetus. Can referencing agency priorities or strategic science plans documentation as relevant.
Scenario Detail Section
Actors: key people and/or systems involved in the project.
Preconditions: preconditions, requirements, assumptions, and state changes necessary for the scenario to be executed.
Critical Existing Cyberinfrastructure: existing data repositories, software, etc. needed.
Measures of Success: the important outcome or product if the scenario workflow is completed.
Basic Flow: steps to be followed in doing the user scenario. Often referred to as the primary scenario or course of events.
Alternate Flow: any alternate workflows that might occur, e.g. to handle error conditions.
Activity Diagram: a picture or flow chart that captures the major workflow steps, actors at each step, inputs and outputs at each step, and optional alternate paths.
Major Outcome and Post Conditions: conditions that will be true of the state of the system after the scenario has been completed. Including what happens with data and other products after the project finishes.
Problems/Challenges: any significant or disruptive problems or challenges that prevent or interfere with the successful completion of the activity. For each one, list the challenge and who/how it impacts; what, if any, efforts have been undertaken to fix these problem; recommendations for tackling this problem; how the larger community can address this problem.
References: links to other relevant information such as background, clarifying and otherwise useful source material. Include web site links, project names, overall charters, additional points of contact, etc. This section is distinct from Primary Documentation, which just describes the particular scenario.
Notes: Any additional important information.
Technical Section
Data Characteristics: describe all the data involved in the scenario, both existing and desired, as follows:
Data Source(s)
Data Format(s)
Volume (size)
Velocity (e.g. 2TB/day)
Variety (e.g. sensor data, model output)
Variability (e.g. differences in site density across studies)
Veracity/Data Quality (accuracy, precision)
Data Types (e.g. sequence data, core images)
Standards: any standards that were followed for the cyberinfrastructure resources, even if already mentioned above. Standards can apply to data, models, metadata, etc.
Data Visualization and Analytics: analysis and visualization capabilities needed for the scenario, whether existing or desired.
Software: For any important software used, describe the important characteristics (source, language, input format, output format, CPU requirements, etc.).
Metadata: Provide a link to, or include, any relevant metadata adding additional detail and context to the dataset(s) described above.

Table 2

Summary of cyberinfrastructure challenges expressed in the 49 use cases. The percents do not add up to the category totals because 1) one use case can express challenges in more than one subcategory; and 2) challenges expressed by three or fewer use cases were not listed, but were included in the category counts. Percentages are absolute not relative.

Data Challenges
78%	Data Access/availability
	28%	Data not online
	18%	Data in multiple online sources
	14%	Hard to search for desired data in online source
	12%	Important relationships between data in multiple sources missing
	8%	Hard to find/access data in publications
	8%	Sharing data is difficult/lacks incentives
32%	Data variety, diversity, and heterogeneity issues
	24%	Data format diversity
	8%	Semantic variability
	12%	Integrating different data types (discrete vs continuous, sensor vs 4D model, etc.)
Other
	18%	Total data volume
	16%	Needed data does not exist (e.g. not enough sensors, or gaps in the data)
	14%	Insufficient or uncertain data quality
	14%	Insufficient metadata
Non-Data Challenges
36%	Software
	30%	Desired software does not exist
	12%	Desired software exists, but is not accessible/reusable
28%	Best practices, protocols, standards, other guidance needed
26%	Funding challenges, especially long-term sustainability for CI
16%	Networking, Storage, CPU
12%	Access to informatics/computer science expertise

A word cloud created from the summary of cyberinfrastructure challenges extracted from each user scenario. The size of each word is approximately proportional to the number of uses of that word. Common words and numbers are not included.

Table 3

Standards mentioned in user scenarios. Note that this represents the scientists’ reporting on the standards they use, and items like ‘GPS’ are included even though a technologist would not consider them a standard.

Standard	# of Use Cases
OGC	5
DOI	2
EML	2
GPS	2
iGSN	2
NetCDF	2
BagIt	1
CDF	1
CF metadata	1
CUAHSI WFS	1
DCAT	1
Excel	1
GCIS	1
iPLANT	1
IRIS	1
ISO19115	1
Memex	1
MIMS/MIGS	1
NOAA	1
Nutch	1
SEAD	1
SEASAS	1
SensorML	1
USDA	1
UTF-8	1
VIVO	1
WOCE	1

Table 4

Data formats mentioned in user scenarios.

Format	# of Use Cases
CSV	14
NetCDF	11
MATLAB .mat	6
Excel	6
txt	5
ArcGIS/ESRI shapefiles	4
jpeg	3
tiff	3
tsv	2
SEED	2
xls	1
mzML	1
mzXML	1
geojson	1
geotiff	1
GIS	1
grib	1
HDF	1
HTML	1
IRIS	1
JSON	1
miniC	1
MSAccess	1
Pivotpilot	1
png	1
UTF-8 unicode	1

Table 5

Software mentioned in user scenarios. Only those found in two or more use cases are listed. Note that the category ‘in-house’ is not a single software, but includes any mention of unnamed software/code developed by the group of the interviewee.

Format	# of Use Cases
MATLAB	17
In-house code	10
Excel	9
ArcGIS	7
R	5
Adobe Illustrator	4
Python	3
Google Earth Engine	3
IRIS/DMC tools	3
IDL	2
NCAR Tool, NCL	2
Fledermaus	2
Mathematica	2
ODV	2
Paraview	2
GDAL	2
VLC	2
Petrel	2
SQL	2
STRABO	2