Introduction
Freshwater harmful algal blooms (HABs) are an urgent and growing concern worldwide, with warmer water temperatures and greater nutrient enrichment leading to increases in bloom frequency and severity [1]. HABs often have significant economic impacts, including adverse effects on drinking water supplies and aquatic food production and disruptions to tourism and recreation [2]. Although not all algal blooms are harmful, many are caused by toxin-producing cyanobacteria that can pose a threat to human and ecosystem health [3]. The widespread occurrence and dynamic nature of algal blooms make them difficult to monitor effectively via traditional field-based sampling, but remote sensing has become a powerful tool for detecting and tracking HABs over large spatial scales and at high frequency [4, 5]. Standard approaches involve measuring reflectance in various spectral bands from an earth observing satellite and using reflectance to estimate concentrations of chlorophyll-a, a pigment that is common to all phytoplankton, and/or phycocyanin, which is specifically associated with cyanobacteria [6]. However, these methods do not provide information on which specific types of cyanobacteria might be present within a bloom, which is critical for understanding the potential for toxin production and environmental impacts.
To provide this additional level of detail, we introduced a hyperspectral remote sensing framework for not only detecting an algal bloom but also identifying which types of cyanobacteria are present [7]. Our approach is predicated upon the notion that the distinctive reflectance characteristics of various cyanobacteria genera can be used to infer the taxonomic composition of a bloom via Multiple Endmember Spectral Mixture Analysis (MESMA) [8]. The software described herein implements MESMA as the core component of a framework referred to as Spectral Mixture Analysis for Surveillance of Harmful Algal Blooms, or SMASH [7]. The basic inputs to SMASH are a library of reflectance spectra for various kinds of cyanobacteria, referred to as endmembers, and a hyperspectral image in which the particular taxa included in the library might occur. The primary output from the MESMA algorithm is an estimate of the fractional abundance of each endmember, plus water, within each pixel of the image. For example, in the initial proof-of-concept investigation, we compiled a library of 12 cyanobacteria genera by measuring their reflectance in a laboratory setting [9] and then performed SMASH using hyperspectral images from four U.S. water bodies. One of these images, from Upper Klamath Lake in Oregon, is featured as an example herein.
The purpose of this paper is to introduce the Software Application for SMASH (SAS), a standalone tool for using remotely sensed data to map potentially harmful algal blooms and provide genus-level taxonomic information on the types of cyanobacteria that are present within an image. SAS has been released as a scientific software product by the U.S. Geological Survey, and as of Version 1.0.7, includes functionality for all key components of the SMASH workflow [10]. An overview of the software interface and workflow is provided in Figure 1. More specifically, given a spectral library and an image as input, SAS provides an integrated collection of tools for:

Figure 1
High-level overview of the SAS workflow and corresponding interface components. The user begins with (1) hyperspectral image input, then (2) image pre-processing and filtering, followed by (3) spectral library input in the “Select input data” tab. In the second “Run SMASH” tab, the user (4) sets up a MESMA run by specifying parameters and then (5) performs SMASH and examines the results [10].
Importing a hyperspectral image and creating spatial and spectral subsets;
Producing a mask to isolate the water body of interest;
Smoothing the image spectrally and spatially;
Generating maps of commonly used spectral indices designed to highlight chlorophyll and cyanobacteria;
Importing a spectral library and ensuring that its spectra are compatible with those extracted from the image;
Extracting water endmembers from image pixels selected interactively;
Quantifying the spectral separability of all possible endmember combinations;
Parameterizing the MESMA algorithm by specifying end member fraction constraints and a maximum Root Mean Squared Error (RMSE) between modeled and observed spectra;
Producing a range of outputs that include a classified map of cyanobacterial genera, a histogram summarizing the taxonomic distribution, fraction images for each endmember, false color composites for visualizing the spatial pattern of selected pairs of endmembers, and an RMSE image that summarizes uncertainty;
Exporting these outputs as figures and GeoTIFF images and saving a SAS session to a file for later use.
The following sections of this paper provide an overview of each of these tasks; more detailed instructions are available in a tutorial document that accompanies the software [10].
Our motivation for creating SAS is to facilitate application of the SMASH framework as a means of better understanding and more effectively managing potentially harmful algal blooms. The software was designed to be accessible to a broad community of end users that do not necessarily have significant remote sensing expertise or image processing experience. SAS was developed using the MATLAB App Designer, a tool for creating graphical user interfaces (GUIs) and building standalone applications. The code underlying SAS was primarily written in MATLAB (R2023a, version 9.14), but we used an existing Python implementation for the core MESMA algorithm [11]. The software is distributed as an executable that can be installed on a Windows computer and run as an independent program, with no MATLAB license required.
To date, the only SMASH-based publication is the original paper that introduced the framework [7], but we are actively using SAS in ongoing HAB-related research focused on lakes and rivers throughout the U.S. The codebase first developed for our initial study underwent further testing and refinement and has now been packaged and distributed as SAS to facilitate greater use of SMASH. To the best of our knowledge, no other software provides similar functionality for identifying cyanobacteria at the genus level. However, remote sensing is widely used to map and monitor the occurrence of algal blooms, and we view SAS as complementary to existing, well-established tools for detecting HABs, such as those developed through the U.S. Environmental Protection Agency’s Cyanobacteria Assessment Network (CyAN) [5].
Implementation and architecture
This section briefly describes how we implemented the SMASH framework in SAS and illustrates the architecture of the software by walking through each phase of the workflow using the Upper Klamath Lake data set from [7] as an example. Figure 1 provides a high-level overview of the SAS workflow and corresponding interface components, which are organized into a pair of tabs at the top of the window: “Select input data” and “Run SMASH”. Within each tab, workflow components are color-coded, and the user moves from top to bottom and then from left to right through the various phases of the overall processing chain. A panel on the lower right side of the main SAS window provides status messages and logs SAS output. The software also writes the output of this panel to a text file in the working directory. Images, spectra, and derived products are displayed on the right side of the SAS window; some functions within SAS will also generate separate figure windows that can be closed once viewed. At any point, the user can save the results of a SAS session to a MATLAB *.mat data file and then load that file back into SAS later to resume the session and/or access previously created output.
Hyperspectral image input
The first step in the SAS workflow is to select the hyperspectral image data to be analyzed via the panel labeled 1 in Figure 1; the image must be in a GeoTIFF format. The original study introducing SMASH [7] used data acquired by the DLR (the German equivalent of NASA) Earth Sensing Imaging Spectrometer (DESIS) [12] instrument onboard the International Space Station and this is the default sensor type for SAS. Although the software also includes band information and spectral response functions for the PRISMA (a hyperspectral sensor operated by the Italian Space Agency), Sentinel 2A and 2B [13], as well as Landsat 8 and 9 sensors [14], SAS has only been tested using data from DESIS. To facilitate broader application of the software and enable further valivation of its outputs, we have also introduced new functionality that allows for the use of custom sensor types, as described in Section 3.2. Once a file is selected, the image dimensions and range of wavelengths will be listed in the log panel, and a near-infrared band will be displayed on the right side of the SAS window. A reflectance scale factor can then be specified to convert image pixel values, which are often stored as integers, into reflectance values between 0 and 1. For an example, the default scale factor of 10,000 would translate a raw pixel value of 3,250 into a reflectance of 0.0325. Two additional fields are provided for the user to specify a code for the site and the image date. These values will be concatenated and used to automatically name all outputs generated by SAS and placed in a “SMASH” folder in the working directory. Most hyperspectral imaging systems like DESIS span a broader range of wavelengths than is actually useful for characterizing water bodies due to strong absorption of near-infrared radiation by water, so the original image can be spectrally subset by specifying a minimum and maximum wavelength; the default values of 400 nm and 800 nm are appropriate in most cases. Because the water body of interest might occupy only a small fraction of the imported image, SAS includes a tool for cropping the image to create a rectangular spatial subset. The original image is displayed in a new figure window and the user is prompted to click and drag a rectangle to define the region of interest. The smaller subset image will then be displayed in the main SAS window.
Image pre-processing and filtering
The next phase of the workflow is implemented in the “Image pre-processing” panel in the lower left quadrant of the SAS window (labeled 2 in Figure 1). The water body of interest is first extracted from the remainder of the image by producing a water-only mask based on the Normalized Difference Water Index [15]. NDWI values are calculated on a per-pixel basis, and the resulting NDWI image is displayed on the right side of the main SAS window. The software provides several options for producing a mask, including automated and interactive NDWI thresholding, manually digitizing a polygon, or importing a shapefile, which must be in the same coordinate system as the image. Here, we illustrate an interactive threshold method that provides the user with some control and is also computationally efficient. After the user pushes a button to create the initial mask, three new windows appear (Figure 2): the NDWI image, an adjust contrast tool, and a dialog box for entering threshold values. The vertical red bars in the contrast adjustment tool can be repositioned to manipulate the brightness and contrast of the NDWI image. The histogram of pixel values shown within this tool, along with the image display, help the user identify upper and lower threshold values that effectively isolate the water from the terrestrial portions of the image. The binary mask produced by applying these thresholds is then displayed in a separate figure window and the user prompted to select which image segments to include. This step enables the user to choose the water body of interest and exclude any smaller, outlying areas of water. In addition to this segmentation, SAS also automatically refines the mask by applying morphological operations to remove isolated pixels and fill in small gaps. Initially, the final mask will appear in the display panel on the right side of the SAS interface, but display panel controls below the image allow the user to select a single band to display in grayscale or a set of three bands to display as the red, green, and blue components of a color composite.

Figure 2
The interactive threshold masking method involves three new windows: one displaying the NDWI image, another that contains the histogram of pixel values within this image and allows the user to adjust the contrast by moving the red slider bars, and a smaller dialog box for specifying the minimum and maximum band threshold values used to produce the initial mask [10].
The next stage of the SAS workflow involves smoothing the data by filtering the image first spectrally, over the wavelengths on a per-pixel basis, and then spatially, from pixel to pixel across the full extent of the masked water-only image. These steps are intended to reduce noise and provide a more coherent image for input to the core MESMA algorithm. The “Image filtering” panel in the lower left corner of the SAS window includes fields for entering three parameters of a Savitzky-Golay spectral smoothing filter [16]: n is the number of times the filter will be applied to each input spectrum (i.e., image pixel), p is the order of the smoothing polynomial, and length is the number of bands to be included in the moving window that slides along the spectrum during the smoothing operation. In our experience, the default values of 2, 3, and 7, respectively, are suitable in most cases. Similarly, the user can adjust the size of the square window used by a Wiener filter [17] to smooth the image spatially. After these two filters have been applied, a new contrast-stretched color composite will appear in the display panel on the right side of the SAS window with the bands listed in the title serving as the R, G, and B components. After isolating the water from the land and filtering and smoothing the data, striking visual patterns are revealed in the image from Upper Klamath Lake (Figure 3).

Figure 3
The masking tools in SAS allow the water body of interest to be isolated and spectral and spatial filters enhance the smoothness of the image [10].
At this point, with the pre-processing complete, the user can begin to examine the spatial distribution of algae by calculating a pair of widely used spectral indices: the Normalized Difference Chlorophyll Index (NDCI) [18] and the Cyanobacterial Index (CI) [6]. Please refer to [7] and the original references cited therein for additional detail on these indices. Within SAS, the user can click a button to produce a new figure window displaying a map for each index. In addition, these maps will be saved as GeoTIFF image files within a “SMASH” subfolder created within the working directory. The file names are defined automatically with the specified site code and date concatenated to produce the root file name and then “ndci.tif” and “ci.tif” to create the two spectral index images. Also note that the NDWI image and the final water mask created via SAS are also saved as GeoTIFF’s within the same subfolder.
Spectral library input
In addition to a hyperspectral image that can be used to map algal blooms, the other fundamental, required input to SAS is a spectral library of cyanobacterial endmembers. This library serves as a database on the reflectance characteristics of various taxa that might be found within the water body of interest. By comparing the reflectance spectrum recorded in each pixel of the image to these ‘type specimens’, the MESMA algorithm provides a means of inferring which endmembers are present within the pixel and their relative proportions. This process begins by importing a library via the panel labeled 3 in Figure 1; the zip folder with the SAS installer includes a copy of the spectral library developed by [9] and used by [7]. Upon loading the library, spectra for all endmembers are plotted in the display panel on the right side of the SAS window and their names used to populate the table in the “Data preparation” section of the “Select input data” tab. Another field allows the user to specify a reflectance scale factor for the library, similar to that for the image.
Along with the various cyanobacterial genera included in the spectral library, one more endmember must be provided before proceeding to the MESMA stage of the workflow: water. The aquatic endmember essentially serves as a dark background and is analogous to the use of shade as an endmember in terrestrial applications of MESMA [8]. SAS provides two options for obtaining water endmembers. In the first, more interactive approach, the image is displayed in a new figure window, and the user is prompted to select pixels from which spectra will be extracted to serve as potential water endmembers. These pixels ideally would be free of algae, but we caution that a given image might not include any deep, dark, clear water suitable for use as an endmember. Spectra for the selected pixels are added to the plot with the spectral library and distinguished from one another in the legend by their spatial coordinates. The second approach is to import one or more water endmembers from a text file analogous to that containing the cyanobacterial genera. A representative water endmember extracted from a separate DESIS image of central Oregon’s Detroit Lake is included in the SAS installer zip file. The scale factor for the water endmember is assumed to be the same as the scale factor for the cyanobacterial spectral library. Upon loading the text file, the new water endmembers will be added to the plot along with the spectra for the various cyanobacteria.
Before the spectral library and water endmember can be used in combination with the hyperspectral image, the user must ensure that the two data sources provide spectral information for the same set of wavelengths: the individual bands of the hyperspectral image. To resample any spectral library provided as input to the bands of a particular imaging system, SAS includes sensor response functions for several instruments, with DESIS as the default. SAS performs this convolution operation by weighting the input spectra from the library by the response function for each sensor band. The end result of this process is a new, resampled set of spectra at the wavelengths corresponding to the imaging system’s band centers. For the Upper Klamath Lake example, the library spectra originally recorded at 79 wavelengths have been resampled to the 156 bands of the DESIS imaging system within the 400–800 nm range. After resampling the library spectra to be compatible with the image, the final pre-processing step is to apply the same kind of Savitzky-Golay spectral smoothing filter [16] to the endmembers in the library that was used previously for each pixel of the image.
After importing, resampling, and smoothing the spectral library and water endmember(s), the spectra can be viewed interactively within the SAS interface by launching the “Spectral library viewer” from a green-colored panel labeled “Spectral library selection” (labeled 3 in Figure 1). The display panel on the right will then show the cyanobacterial endmembers with distinct lines and symbols in green to yellow hues, as well as the water endmember(s) in blue. To further distinguish the spectra for particular taxa of interest, the user can select an endmember (or endmembers) to plot in bold from the table on the left. Similarly, unchecking the box next to an endmember name removes the corresponding line from the plot. For the example shown in Figure 4, the spectrum for Aphanizomenon is highlighted and the line for one of our candidate water endmembers is turned off. At this point, the resampled and smoothed spectra can be saved to a new library file with wavelengths in the first column, endmember names in the first row, and reflectance values for each endmember at each wavelength in the subsequent rows of the table.

Figure 4
SAS includes a spectral library viewer that allows the user to highlight specific endmembers in bold, or to remove them from the plot [10].
Multiple Endmember Spectral Mixture Analysis (MESMA) setup
Once the hyperspectral image has been prepared, the spectral library convolved, and a water endmember selected, the user is ready to proceed to the main MESMA phase of the SMASH workflow. The “Run SMASH” tab in SAS includes a GUI for parameterizing a run of the core MESMA algorithm via the panel labeled 4 in Figure 1. For example, the user can set various constraints on the MESMA calculations. In principle, the sum of the endmember fractions for a given pixel should be 1, but having such a perfect mixture model is unlikely, so bounds must be placed on how far beyond the ideal 0–1 range of endmember fractions will be considered acceptable. For the algal endmembers, we have found that the default minimum and maximum fraction constraints of –0.05 and 1.05 are appropriate in most cases [7]. Only one dark (water) endmember can be included in the mixture models, so SAS provides a dropdown list from which the user can select a single option. Typically, the same fraction constraints are used for the water endmember as for the algae endmembers (i.e., –0.05 and 1.05) [7]. The table on the right side of this panel lists the endmembers and allows the user to turn off or bold the corresponding lines in the spectral library viewer. The third column of the table, labeled “Include”, provides a means of specifying which endmembers will be included in the MESMA. This feature gives the user the option of including only those taxa that might plausibly be found in the water body of interest, as well as selecting a single water endmember if multiple candidates were extracted from the image or imported from a file. The final constraint to be specified is the maximum root-mean-square error (RMSE), which is essentially a metric of the disagreement between the observed image spectrum for a given pixel and that modeled by MESMA; refer to [8] and [7] for further detail. Our testing suggests that the SAS default value of 0.02 is usually acceptable [7].
Evaluating endmembers and performing MESMA
SAS includes another tool to help guide endmember selection before proceeding to MESMA. The user can assess how distinct the potential endmembers are from one another by calculating normalized spectral separability scores (NS3) following [19]. For the first round of these calculations, all prospective endmembers will be included, but after clicking a “Subset library” button, only those endmembers for which the box in the “Include” column of the table will be retained. The results are summarized in a plot like that shown in Figure 5. This matrix represents the degree to which each pair of spectra differ from one another in terms of their spectral shape and overall magnitude (i.e., brightness). A higher score indicates that the two spectra are more distinct and thus more likely to be unmixed successfully via MESMA. For further details and an interpretation of the plot shown in Figure 5, refer to [7]. In practice, this analysis can be performed iteratively with different subsets of the library to help guide selection of an appropriate set of endmembers.

Figure 5
SAS provides a tool for calculating normalized spectral separability scores (NS3) that can help guide endmember selection [10].
Once all the constraints have been set and the final endmembers selected, the spectral mixture analysis can be initiated by simply clicking the “Run MESMA” button in the SAS interface, located in the panel labeled 5 in Figure 1. A dialog box appears to remind the user that only one water endmember can be used. The water endmember is assumed to be the last item in the library (i.e., list of endmembers) and the user is prompted to confirm that the correct water endmember has been specified. Along with the main SAS interface, the program also launches a separate command window that shows a running tally of the number of mixture models evaluated.
Examining MESMA results
Once a MESMA run has been completed, a panel labeled “Visualize SMASH output” (5 in Figure 1) provides the user with numerous options for generating data products to examine. For example, the user can easily produce a classified map that depicts the spatial distribution of dominant taxa within the lake; that is, the endmember with the highest fractional abundance within each pixel of the image. An example from Upper Klamath Lake is shown in Figure 6. However, a pixel is left unclassified when the weighted linear combination of endmembers that best matches the observed spectrum for that pixel fails to satisfy the fraction constraints and/or maximum RMSE threshold specified by the user; these unclassified areas are colored black in Figure 6. The user can also request a taxa distribution to obtain a histogram summarizing the area of the water body assigned to each of the end members; this information is also used to populate a table on the left side of the SAS interface in units of both km2 and percentage of water pixels (Figure 7).

Figure 6
One of the SMASH outputs is a classified map that illustrates the dominant cyanobacterial genera identified within each pixel of the image [10].

Figure 7
The distribution of dominant cyanobacterial genera can also be summarized via a histogram and a table listing the area and percentage of the water pixels identified in the masking step assigned to each endmember [10].
Given this information on the frequency distribution and spatial pattern of cyanobacteria taxa within the water body, the user can then focus on a genus of interest by selecting that particular endmember from a dropdown list. For example, consider the Aphanizomenon found in Upper Klamath Lake. One basic visualization is a class mask that highlights those pixels for which Aphanizomenon was inferred to be the dominant taxon, much of the central portion of the lake in this case. Just because Aphanizomenon was dominant does not imply that it was the only taxon present, however, and another, more informative display is a fraction image in which the actual MESMA-based factional abundances of this endmember are displayed for each pixel (Figure 8). This figure shows that for most of the pixels where Aphanizomenon was dominant, this genus had typical fraction values on the order of 0.5, implying that significant proportions of these pixels were composed of water and/or other types of cyanobacteria as well.

Figure 8
SAS can produce a map displaying the actual endmember fractions for a genus selected by the user, such as this example for Aphanizomenon in Upper Klamath Lake [10].
An effective way to visualize potential interactions between taxa is to produce a false color composite by selecting endmembers to display as red and green from a pair of dropdown lists in the “Multiple endmember fractions” section of SAS interface; the water fraction is used as the blue component of the composite. For example, selecting Aphanizomenon and Gloeotrichia as the endmembers to display as red and green, respectively, leads to an intriguing visualization that effectively represents the distribution of these two genera (and water) throughout the lake (Figure 9). Alternatively, the endmember fractions for all the cyanobacterial genera in the library can be combined in a single mosaic display like Figure 10.

Figure 9
A false color composite can provide an effective means of visualizing the spatial distribution of and interactions between two selected endmembers, such as this representation of Aphanizomenon and Gloeotrichia in Upper Klamath Lake [10].

Figure 10
SAS can also produce a larger figure in which the endmember fractions for all endmembers, including water, are arranged as the tiles of a mosaic [10].
Quality control
Software testing, review, and approval
SAS has been subjected to rigorous quality control standards, and the software now provides end users with several methods of quantifying the uncertainty associated with MESMA output. Perhaps most importantly, the underlying codebase that evolved into SAS was thoroughly vetted as part of the original paper introducing the SMASH framework [7]. For example, we showed that the MESMA algorithm was capable of accurately reproducing known input fractions for simulated mixtures that included all pairwise combinations of cyanobacterial genera and water. Moreover, comparison of endmember fractions from each of the seven DESIS images we examined with relative biovolumes calculated from field samples indicated that taxonomic information from SMASH was consistent with field observations. The algorithm successfully identified Microcystis in a lake in New York but avoided misclassifying Asterionella, a genus not included in the library used as input, in another lake in Oregon. To date, the SMASH framework has only been tested in the handful of water bodies reported in [7], and we acknowledge that further validation is needed. One of our primary motivations for developing SAS was to make a software tool for this type of analysis accessible to a broader user community to facilitate additional testing of SMASH across a broader range of aquatic environments.
In addition, prior to release as an official USGS Scientific Software product [20], SAS was subjected to three types of review: 1) administrative, focused on potential security issues; 2) code, concerned with implementation of the software and quality of documentation; and 3) domain, emphasizing subject matter expertise, with an eye toward potential applications. For example, as part of the domain review, SAS was applied to one of the other data sets from [7], Owasco Lake, and the output from the new program was found to be consistent with the results reported by [7]. We also had to gain final approval from the USGS software management team that administers the code.usgs.gov GitLab repository before SAS was made publicly available.
To allow the end user to confirm that SAS is functioning as intended, the installation zip file includes a thorough tutorial document that guides the user through a full, worked example using data files that are also packaged with the zip archive. Due to licensing restrictions, the DESIS hyperspectral image is not included with the SAS distribution, but the user can request access to DESIS data through Teledyne Brown Engineering ([21]). The SAS tutorial then provides detailed instructions on how the user can obtain the image via Teledyne’s web-based TCloud portal [21]. Completing this training exercise will confirm that the software is working properly and provide knowledge on how to use SAS.
Assumptions and limitations
In the spirit of quality control, we want to explicitly call attention to several important assumptions made by the current version of SAS. Potential users of this software must be aware of the following limitations:
SMASH simplistically assumes that the water column is composed entirely of pure water and cyanobacteria. The linear spectral mixture models produced via SAS thus do not consider other optically active constituents such as suspended sediment or colored dissolved organic matter (CDOM) that, in reality, are often present to varying degrees.
SAS assumes that the satellite image provided as input is in units of surface reflectance and that atmospheric effects have been removed. Such atmospheric correction has already been applied to some data products, which can thus be used as delivered, or can be performed externally in other software (e.g., ENVI [22], ERDAS [23], and custom codebases and plugins to open source programs such as QGIS [24] or SNAP [25]), but SAS itself does not provide this kind of functionality.
The entire SMASH framework is predicated upon the notion that differences in reflectance among cyanobacterial genera can be used to distinguish various taxa from one another. Because these distinctions might be quite subtle, SMASH is most likely to provide accurate results when hyperspectral data consisting of many contiguous, narrow wavelength bands are available as input.
Hyperspectral data from DESIS, PRISMA, or another type of sensor with a similar level of spectral resolution could likely be used to distinguish cyanobacterial genera from one another. At present, SAS has only been tested using DESIS images. However, to provide some flexibility for incorporating a wider range of instrumentation in the future, SAS can import band center wavelength and spectral response function information for custom sensors.
Uncertainty quantification and sensitivity analysis
To provide end users with some insight on the uncertainty associated with the output from the core MESMA algorithm, SAS can produce a map of RMSE values. The RMSE represents a quantitative summary of the mismatch between the observed and modeled spectral mixtures over the full range of wavelengths included and on a per-pixel basis. Displaying RMSE values as a map like that shown in Figure 11 can provide useful information on where endmember fraction estimates should be regarded with greater caution. This information can also be partitioned on a per endmember basis, and SAS can produce another type of figure that shows the mean RMSE value over all the pixels for which a particular endmember was the dominant genus. For example, Figure 12 indicates which types of cyanobacteria were less well-represented by the endmembers in the library.

Figure 11
The RMSE calculated for each image pixel provides a convenient summary of the uncertainty associated with the MESMA output based on the mismatch between the observed and modeled spectral mixtures [10].

Figure 12
Averaging the RMSE values over all the pixels for which a particular taxon was identified as the dominant endmember can provide insight as to which types of cyanobacteria were not modeled well by the MESMA algorithm [10].
After performing an initial MESMA run, the user might want to revisit what is arguably the most influential parameter specified in the “MESMA constraints” panel of the SAS interface: the maximum allowable RMSE. The output from the MESMA algorithm can be highly sensitive to the maximum RMSE, so SAS provides a convenient tool for evaluating a range of maximum RMSE values. This sensitivity analysis involves repeating the MESMA for a series of maximum RMSE values from 0.001 up to the currently specified maximum RMSE in increments of 0.001. The default maximum RMSE of 0.02 results in a total of 20 full MESMA runs, so this analysis is computationally intensive and might require a long run time. A progress monitor is updated as the calculations proceed, and once the run is finished, a figure summarizing the sensitivity analysis will appear (Figure 13). The proportion of the water body assigned to each endmember, or left unclassified, is plotted as a function of the maximum RMSE and can help the user identify a value representing a reasonable compromise between model accuracy (i.e., a small value of the maximum RMSE, which could lead to many pixels being left unclassified) and classifying power (i.e., a larger value of the maximum RMSE, which will result in a greater proportion of the lake being associated with a particular endmember). Please see [7] for further discussion of this topic. At least one MESMA run must be completed before an RMSE sensitivity analysis can be conducted, but performing this type of analysis is not required to proceed with the workflow.

Figure 13
SAS provides a tool for conducting an analysis of the sensitivity of the MESMA output to the maximum RMSE constraint. The MESMA is repeated for, in this case, 20 different values of the maximum RMSE between 0.001 and 0.02, with the proportion of the image assigned to each endmember, or left unclassified, plotted as a function of the maximum RMSE [10].
Availability
Operating system
Developed and tested using Windows 10 Enterprise 22H2.
Programming language
Underlying code and main application interface developed in MATLAB R2023a, with internal calls to functions implemented in Python 3.10.10.
Additional system requirements
1.61 GB of disk space is required for the installer.
Dependencies
Although SAS was developed in MATLAB, a MATLAB license is not required for end users to run the compiled, executable version of SAS. Instead, SAS is based upon a separate MATLAB Runtime package that is included as part of the SAS distribution contained within a zip file. Version 1.0.6 of SAS requires MATLAB Runtime version 9.14.0.2337262 (R2023a) or later, which is also freely available from MathWorks at https://www.mathworks.com/products/compiler/matlab-runtime.html. All other dependencies are included in the zip file with the installer.
List of contributors
Carl J. Legleiter developed the underlying code and user interface for the software and led the writing of this manuscript. Tyler V. King performed testing and offered suggestions for improvement throughout the software development process and assisted with writing this manuscript.
Software location
Archive
Name: code.usgs.gov
Persistent identifier: https://doi.org/10.5066/P928658I
License: Creative Commons Zero v1.0 Universal
Publisher: USGS
Version published: Version 1.0.6
Date published: September 30, 2023
Code repository
Name: code.usgs.gov
Persistent identifier: https://doi.org/10.5066/P928658I
License: Creative Commons Zero v1.0 Universal
Date published: September 30, 2023
Language
English
Reuse potential
Exporting data products
Our primary motivation for developing SAS was to make the SMASH framework more accessible to a broader range of end users. To further facilitate dissemination of SAS output, the software includes functionality for exporting any of the data products available within the “Visualize SMASH output” panel of the interface as separate files, independent of SAS, for distribution or later use. To do so, the user can select an item from a dropdown list of products Table 1. For those options that are in the form of maps or images (i.e., the classified map, class mask, fraction image, fraction color composite, fraction mosaic, or RMSE image), the selected product will be saved as a GeoTIFF file with the same projection as the hyperspectral image used as input. For the taxa distribution and endmember mean RMSE, the information is written to a comma-separated ASCII text file. All files are named automatically based on the site and date code specified in the “Select input data” tab, along with the endmember name as appropriate, in the “SMASH” folder created within the working directory. For example, the Upper Klamath Lake fraction image for the Aphanizomenon endmember would lead to a file named “UK20200810fractionAphanizomenon.tif”. Whereas most of the available image data products consist of a single layer or band, the fraction mosaic contains multiple layers, one for each endmember. The fraction mosaic is thus organized with each end member as a separate band, with all the cyanobacterial endmembers first in the order in which they are listed in the library, followed by the water fraction as the last band.
Table 1
Output data products available for export from SAS. nem denotes the number of endmembers. See text for further details on image geo-referencing, file naming conventions, and the organization of the bands in the fraction mosaic image.
| SAS OUTPUT DATA PRODUCT | FILE FORMAT | NUMBER OF BANDS |
|---|---|---|
| Classified map | GeoTIFF | 1 |
| Class mask | GeoTIFF | 1 |
| Fraction image for selected endmember | GeoTIFF | 1 |
| Fraction color composite for two selected endmembers plus water | GeoTIFF | 3 |
| Fraction mosaic | GeoTIFF | nem + 1 |
| Root mean squared error (RMSE) image | GeoTIFF | 1 |
| Taxa distribution | ASCII csv | N/A |
| Endmember mean RMSE | ASCII csv | N/A |
Batch processing
We intend for SAS to be accessible to scientists and managers that are not necessarily remote sensing experts, and we designed the graphical interface to be user-friendly. However, all the main functions in SAS are also implemented in a “batch mode” tailored for experienced power users and/or cases where numerous images must be processed in a more efficient, automated manner. This batch mode is accessed via two separate, standalone executables that are also included in the zip file with the installer named “batchSMASH.exe” and “batchSMASHcustom.exe”. These programs can be called from a system command window or from within a script written in MATLAB, R, Python, or some other language. For example, to invoke the executable via the command line, the syntax is “batchSMASH.exe <inputFile>”. The sole input argument, “<inputFile>” is the name of a text file that contains the settings required to parameterize the SMASH run. The input file must be a plain ASCII text file consisting of lines formatted as “parameterName = parameterValue” with one parameter on each line and an equal sign between the name and value. Parameter names that can be included (and most are required) in the input file are listed in an appendix of the SAS tutorial document. If an input is not specified in the file, a default value (also listed in the tutorial) will be used. Once the input file is ready, the SMASH batch mode can be invoked by opening a system command window, navigating to the directory where “batchSMASH.exe” and the required support file “CustomPy.zip” (also included in the main zip archive with the SAS installer) are located, and typing a command like the following (assuming the input file is named “input.txt”): “batchSMASH.exe input.txt”. The program will then run in the system command window and display a series of messages to indicate progress. The resulting output files will be stored in the directory specified by the “outDir” parameter in the input file. The output files will include all data products available for export via the SAS interface described above. In addition, a log file summarizing the input files and parameters used during the SMASH run, as well as the output files generated, is created in the same output directory and is named “batchSMASHlog.txt”. The new executable, “batchSMASHcustom.exe”, included in the latest release of SAS, is similar to “batchSMASH.exe” but allows the user to apply the workflow to a custom sensor type by also providing files with band center wavelengths and spectral response functions as additional inputs.
Support
The primary source of information on how to use SAS is the tutorial included in the zip file with the installer. This document provides a thorough, step-by-step guide to running the software using the Upper Klamath Lake data set as an example. Numerous figures illustrating the various steps in the workflow are included to further assist the user in gaining familiarity with the program. During a SAS session, all input files and selections made via the interface are written to a log file that provides a record of how a run was parameterized and can thus be used to reconstruct an analysis or assist in debugging. This file is named “SMASHlog.txt” and is stored in the “SMASH” subfolder created within the working directory. In addition, a separate file called “SASstandaloneLog.txt” is created in the folder from which SAS was started and will record any error messages. Should any problems arise, please email both of these log files, along with your own verbal description of the problem and any relevant screenshots, to the lead developer at cjl@usgs.gov. Issues can also be submitted via the GitLab repository; we are dedicated to providing direct support to SAS users.
Extensions and applications
The results of the proof-of-concept investigation introducing the SMASH framework indicated that cyanobacterial genera could be differentiated and mapped in hyperspectral images on the basis of their unique reflectance characteristics [7]. We developed SAS to build upon these initial findings, evaluate the capabilities and limitations of SMASH, and apply the approach to monitoring cyanobacterial blooms across a broad range of water bodies. Scientists and managers could use SAS to implement the SMASH framework with the goal of better understanding and mitifating the effects of HABs.
Further testing of SAS in lakes, reservoirs, and large rivers could help to evaluate the potential effectiveness of SMASH as a decision-support tool. Additional studies could assess the SAS algorithm’s susceptibility to false positives by conducting tests in water bodies that are not experiencing active cyanobacterial blooms. Coordinating field sampling with image acquisition would facilitate using biovolume data to assess the accuracy of endmember fractional abundances. The current spectral library only includes 12 genera [9]. Augmenting the library with additional taxa known to form blooms in certain geographic areas, such as Dolichospermum in the western U.S., could increase the utility of SAS. Field spectra acquired from a boat could be used in place of or in addition to the microscope-based approach used to date. [7] outlined several opportunities for further study: impacts of atmospheric correction and water endmember selection on SAS output; sensitivity of the algorithm to suspended sediment, CDOM, and other optically significant constituents; community composition and temporal dynamics of various types of algal blooms; characterizing variability among growth stages within a genus and across different genera; seasonal succession of cyanobacterial blooms; and the implications of these dynamics for food webs, toxin production, and water quality. SAS could play a central role in examining these issues.
Data accessibility statement
The software is available at https://doi.org/10.5066/P928658I and the SAS installer includes an example spectral library published previously as a USGS data release ([9]). Due to licensing restrictions, we cannot include a DESIS hyperspectral image as part of the SAS distribution, but users can request access to DESIS data through Teledyne Brown Engineering ([21]). The SAS tutorial included with the installer and available from the repository [10] provides detailed instructions on how the user can obtain the image via Teledyne’s web-based TCloud portal [21].
Acknowledgements
Internal reviews required as part of the approval process for a USGS Scientific Software product were provided by Jake Zwart and Stephen Hundt. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Funding information
Funding was provided by the USGS Next Generation Water Observing System and National Civil Applications Committee.
Competing interests
The authors have no competing interests to declare.
Author contributions
Carl J. Legleiter developed the underlying code and user interface for the software and led the writing of this manuscript. Tyler V. King performed testing and offered suggestions for improvement throughout the software development process and assisted with writing this manuscript.
