Assessment of an optimal parameter space for spatial cluster detection of SMEAR Estonia flux footprint data using unsupervised learning algorithms
Abstract
Understanding the spatial variability of ecosystem-atmosphere fluxes is essential for accurate carbon and water cycle assessments in forested landscapes. This study investigates the optimal parameter space for spatial cluster detection of flux footprint data from the SMEAR Estonia station using unsupervised learning algorithms. We applied DBSCAN and HDBSCAN clustering methods to half-hourly x-y coordinates of maximal flux contributions, derived from Kljun’s footprint model, over a six-year period. The data were scaled using both standard and robust scalers to mitigate the effects of large coordinate values and outliers. We systematically evaluated clustering performance across a range of hyper-parameters, using silhouette and Davies-Bouldin scores to assess cluster quality. Our results indicate that HDBSCAN, particularly with robust scaling, provides more consistent and interpretable clusters, with lower sensitivity to noise and computational demands compared to DBSCAN. The findings highlight the importance of hyper-parameter selection and scaling in cluster analysis of flux footprint data and demonstrate the utility of density-based clustering for identifying spatial patterns in ecosystem flux measurements. These insights can inform future studies on carbon and water dynamics in heterogeneous forest environments and support the development of climatesmart forestry strategies.
© 2026 Steffen M. Noe, Anuj Thapa Magar, Emílio Graciliano Ferreira Mercuri, published by Estonian University of Life Sciences
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.