Estimation of the stability of tailings dams based on genetic algorithm-supported calibration results and monitoring data

Szczepan Grosel

doi:10.2478/sgem-2026-0001

1

Introduction

The primary metric for evaluating the safety of tailings dams is the factor of safety (FOS) [1,2], defined as the ratio of resisting forces to destructive forces. Currently, a widely used approach for determining the FOS is the strength reduction method [3], which typically involves numerical calculations, most often conducted using the finite element method (FEM). In this procedure, the strength parameters – specifically, cohesion and the tangent of the angle of internal friction – are progressively reduced for a given stress state and pore pressure distribution. The FOS is then calculated as the ratio of the initial strength parameters to the reduced parameters at which the numerical model fails to converge, indicating imminent failure. Notably, both the stress and pore pressure distributions required for these calculations are usually obtained within the same software as part of the simulation of construction stages or load history.

In the context of tailings dams, it is important to acknowledge the prevalence of measurement data, predominantly in the form of piezometric heads, slope surface displacements, and subsurface displacements recorded by inclinometers [4,5]. A subset of these structures is managed using the observational method [6], which, in its fundamental premise, involves modifying the design in response to measured values. When numerical simulations are employed, it is crucial to ensure agreement between the numerical model and historical monitoring data. This approach aims to yield meaningful predictions of measured quantities and reliable FOS values. Although these predictions can be verified (in keeping with the observational method), the FOS itself cannot be directly measured in situ.

Therefore, verifying the accuracy of the calculated FOS values is almost exclusively feasible using numerically generated, i.e. synthetic, data. The process involves the following steps:

(1)
Numerical simulations are carried out for a certain arbitrarily assumed set of material parameters, calculation domain, and boundary conditions,
(2)
The results of these calculations are treated as data from the monitoring network. The displacement or pore pressure histories of several selected nodes mimic measurements from sensors,
(3)
The inverse problem is solved, material parameters are investigated in order for the numerical model to reproduce the generated measurement data,
(4)
A comparison between FOS values: the true one obtained from the initial numerical model and the one achieved by solving the inverse problem.

Theoretically, it is possible to obtain exactly the same parameters as those used to generate the measurement data. However, due to the inherent nature of the inverse problem [7] and the complexity of the numerical model, achieving such precise agreement is rarely possible in practice. It should also be noted that, in this context, solving the inverse problem constitutes a calibration of the numerical model. Calibration is often necessary even when laboratory test results are available [8,9], primarily because of scale effects and spatial variability of parameters in large-scale constructions [10,11].

Recently, the prediction and estimation of FOS values have usually been based on probabilistic methods [12,13], or machine learning techniques using a numerically generated database [14]. Considerably fewer studies involve actual observations. For example, Wang et al. [15] used 77 field cases to predict whether the slope is stable by means of machine learning models. Another study [2] integrated piezometric monitoring data as an input parameter into an analysis of the stability, i.e. FOS calculation, of a collapsed tailings dam in Brumadinho. In the context of the research presented herein, the study by Xue et al. [16] appears to be significant. A Bayesian updating framework in conjunction with a hydro-mechanical coupling approach and a comparison of numerical results with monitoring data were utilized to identify probability density functions of soil parameters. These functions can be employed in probabilistic analyses of slope stability due to rainfall. Given that hydraulic conductivity and shear strength parameters exert a primary influence on slope reliability, these parameters’ distributions were determined by means of inverse analysis.

The considerations in this study constitute, to some extent, an introduction to the development of a method for estimating the FOS of tailings dams based on long-term monitoring data. The background of this research lies in the long-standing collaboration between Wroclaw University of Science and Technology and KGHM Polska Miedź S.A. (a Polish copper and silver mining company), which has primarily focused on three-dimensional analyses of the Żelazny Most tailings storage facility [17,18], the largest structure of its kind in Europe.

The overarching objective is to resolve the inverse problem using a genetic algorithm, aiming to minimize the discrepancy between the results of the numerical model and the measured data. It is assumed that, when this discrepancy reaches zero, the true values of the material parameters, and thus the true FOS values, are obtained. However, as previously noted, identification of the actual parameters is exceedingly challenging in practice. Therefore, it is proposed that all parameter sets tested by the genetic algorithm could be used to estimate the FOS.

2

Methodology

2.1

Inverse problem definition

The primary objective of the inverse problem is to estimate the FOS of a tailings dam for which monitoring data are available. Because FOS values cannot be measured directly and are primarily determined through numerical simulations, the inverse problem consists in identifying the model parameters that ensure consistency between monitoring data and simulation outcomes. Specifically, these parameters correspond to the material properties of the subsoil, dykes, and tailings.

Let

$p \in P$ p{\mathbb{\in }}{\mathbb{P}} denote the vector of unknown model parameters to be estimated,
$F : P \to S$ {\mathscr{F}}{\mathbb{:}}{\mathbb{P}}{\mathbb{\to }}{\mathbb{S}} be the forward operator representing the numerical model, which maps model parameters $p$ p to simulated outputs $s = F (p)$ s{\mathscr{=}}{\mathscr{F}}(p) ,
$m \in S$ m\in {\mathbb{S}} denote the vector of real-world observations or measurements.

The objective is to find the parameter vector $\hat{p}$ \hat{p} for which the simulated data $F (\hat{p})$ {\mathscr{F}}{\mathscr{\left(}}\hat{p}) best matches the observed data $m$ m . This is commonly framed as an optimization problem, where the objective function to be minimized quantifies the discrepancy between simulation results and measurements (1) $\hat{p} = \arg \min_{p \in P} f (F (p), m),$ \hat{p}={\rm{\arg }}\mathop{\min }\limits_{p{\mathbb{\in }}{\mathbb{P}}}f({\mathscr{F}}(p),m), where $f$ f is the objective function, $\hat{p}$ \hat{p} is the optimal set parameters, and $p$ {p} is any candidate set of parameters within the parameter space.

The objective function is typically defined using standard error metrics such as the mean absolute error, mean squared error, or root mean squared error (RMSE). In this study, RMSE was employed. It was assumed that the observation vector $m$ m consists of two types of monitoring data: surface displacements measured at the slope surface, and piezometric heads measured below the ground surface. To combine these heterogeneous data types, a weighted RMSE was utilized, yielding the following form of the objective function: (2) $f = \frac{1}{n} \sum_{i = 1}^{n} w_{i} \sqrt{{(F {(p)}_{i} - m_{i})}^{2}},$ f=\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}{w}_{i}\sqrt{{({\mathscr{F}}{(p)}_{i}-{m}_{i})}^{2}}, where $n$ n is the total number of measurements across all monitoring locations, $w_{i}$ {w}_{i} is the weight assigned to the ith observation, with displacement data assigned a weight of 1, whereas for piezometric measurements, the weights were selected according to the measurement ranges at each sensor in order to balance their quantitative contributions to the objective function.

In cases where the parameter vector p leads to non-convergent numerical calculations, a very large penalty value is assigned to $f$ f . The objective function $f$ f is hereafter referred to as the unfitness metric [19], where lower values correspond to more desirable model outcomes.

2.2

Parameter identification using genetic algorithm

The genetic algorithm has repeatedly been used as a method to identify the values of material parameters that enable agreement between measurements and numerical model results [20–24]. The genetic algorithm, first introduced by Holland [25], belongs to the class of metaheuristic optimization algorithms [26] and uses a population of candidate solutions that evolves over successive iterations, similar to natural evolution. It employs crossover, which involves exchanging information between two solutions, and mutation, which introduces small changes into a solution, to explore new points in the search space. Further information on genetic algorithms can be found in articles by Whitley [27] and Herrera et al. [28].

In this study, the genetic algorithm is implemented using an initial population of 24 individuals, i.e. 24 candidate solutions. Each individual is represented as a sequence of search parameters corresponding to the material properties being optimized. Coding is required only when identifying hydraulic conductivity; in this case, the parameter values must be transformed (as illustrated in Figure 1) to ensure compatibility with the algorithm’s operations. For all other cases, parameter values are used directly without additional coding.

In each iteration, the 12 worst-performing individuals are replaced by new ones, which are created by uniform crossover [29,30] (Figure 2) and mutation based on evolution strategies [31] and aggressive mutation [32]. During mutation, the value of each parameter is calculated according to x' = x·N(1, 0.05), where x′ is the parameter value after mutation, x is the parameter value before mutation, and N(1, 0.05) is a random number drawn from a normal distribution with mean value of 1 and standard deviation of 0.05.

The selection of individuals for crossover (i.e. parents) in the genetic algorithm is performed using a ranking-based approach [33]. In this method, individuals are ranked according to the value of the objective function, with better-performing individuals obtaining higher positions. The probability of an individual being chosen as a parent increases with its ranking position, ensuring that fitter individuals exert a greater influence on the next generation.

The identification algorithm itself was developed and implemented in Python [34], making use of the NumPy library for efficient numerical operations [35]. The custom script automates numerical simulations, processes output data, and quantifies unfitness (objective function value) for each individual, where each represents a numerical model defined by a specific set of material parameters. All key functions were coded independently by the author, which allowed for a customized genetic algorithm implementation, seamless integration of the script with the chosen numerical modelling software, and support for parallel execution of numerical simulations. This approach ensures both flexibility and computational efficiency in the parameter identification process. The algorithm was developed for the identification of typical soil parameters, including Young’s modulus, Poisson’s ratio, hydraulic conductivity, and strength parameters. Furthermore, it is possible to identify other parameters for more complex constitutive models.

2.3

Analysis of identification outcomes

During the minimization process using a genetic algorithm, numerous sets of parameters are evaluated over successive generations. From an optimization standpoint, only the best solution is typically of primary interest. However, all evaluated cases can provide valuable information about the numerical model and the relationships between parameters, as well as between parameters and outcomes. For example, if the genetic algorithm is initialized with a population of 24 individuals and generates 12 new individuals in each of 50 generations, more than 600 cases will be computed. This dataset can be analysed to derive additional insights. It should be noted, however, that many cases may be very similar, as the algorithm tends to converge towards an optimal or suboptimal (i.e. local) solution.

The objective is to identify relationships between calculated FOS values and the objective function. By analysing numerically generated data, as described above, it is possible to assess whether the calculated FOS values closely approximate the value obtained from the parameter set used to generate the monitoring data. This latter value is hereafter referred to as the True FOS. To examine these relationships, the Pearson correlation coefficient can be employed. Given two random variables, X and Y, the Pearson correlation coefficient, denoted as $R_{X, Y}$ {R}_{X,Y} , is defined as the covariance of X and Y normalized by the product of their standard deviations (3) $R_{X, Y} = \frac{cov (X, Y)}{σ_{X} σ_{Y}},$ {R}_{X,Y}=\frac{\mathrm{cov}(X,\hspace{0.25em}Y)}{{\sigma }_{X}{\sigma }_{Y}}, where $cov (X, Y)$ \mathrm{cov}(X,{Y}) is the covariance of X and Y, $σ_{X}$ {\sigma }_{X} is the standard deviation of X, and $σ_{Y}$ {\sigma }_{Y} is the standard deviation of Y. The Pearson correlation coefficient ranges between −1 and 1. A value of zero indicates no linear relationship between the variables. A value of 1 signifies a perfect positive linear correlation, meaning that as X increases, Y increases proportionally. Conversely, a value of −1 indicates a perfect negative linear correlation, such that an increase in X corresponds to a proportional decrease in Y.

The interpretation of Pearson correlation coefficient values, according to Akoglu [36], is summarized in Table 1. This classification provides qualitative guidance for assessing the strength of linear relationships between variables based on the magnitude of the correlation coefficient.

Table 1

Interpretation of Pearson correlation coefficient based on [36].

Correlation coefficient ( $\| R \|$ \|R\| )	Strength of correlation
0.00–0.19	Very weak
0.20–0.39	Weak
0.40–0.59	Moderate
0.60–0.79	Strong
0.80–1.00	Very strong

Source: Author’s contribution.

If a significant linear correlation between unfitness and FOS is observed, it is reasonable to develop a linear regression model to describe their relationship. Such a model can be expressed as follows: (4) $FOS = β_{0} + β_{1} \times unfitness + ε,$ {\rm{FOS}}={\beta }_{0}+{\beta }_{1}\times {\rm{unfitness}}+\varepsilon , where β ₀ is the intercept, β ₁ is the slope coefficient representing the effect of unfitness on FOS, and ε is the error term. Importantly, the intercept β ₀ corresponds to the predicted FOS value when unfitness equals zero – that is, when there is no discrepancy between monitoring data and numerical model results. This prediction is particularly valuable in practical applications, as it provides an estimate of the FOS in situations where multiple FOS values remain equally plausible despite achieving minimal unfitness. Consequently, the regression model can serve as a useful tool for estimating FOS where direct evaluation is difficult. The simplicity and interpretability of the linear regression model make it a preferred starting point, compared to more complex models, particularly if linear association is confirmed. Moreover, this study investigates the relationship between the FOS and key material parameters to verify that hydraulic conductivity and shear strength, particularly the angle of internal friction, are critical for accurate FOS estimation in tailings dams.

3

Numerically generated monitoring data

3.1

Tailings dam characteristics

The analysed tailings dam is based approximately on the Żelazny Most facility, as described by Skau et al. [9], Jamiolkowski [8], and Łydżba et al. [17]. The subsoil profile consists of three main stratigraphic units: Tertiary clays (hereafter designated as 1A), Tertiary sands (2B), and Quaternary sands (3C). The Tertiary clays represent the dominant layer, whereas the Quaternary sands are situated immediately beneath the ground surface. The Żelazny Most facility is a sequentially raised tailings dam constructed using the upstream method [37]. The rate of dam raising is approximately 1.25 m/year. Subsequent embankments are placed on previously deposited sediments, which exhibit finer grain sizes towards the centre of the reservoir.

The monitoring system at the Żelazny Most facility is described in detail by Stefanek et al. [37,38]. For the purposes of this study, it was assumed that the inverse problem uses data from six piezometers and three displacement gauges installed on the slope. A two-dimensional cross-section of the dam is considered, and the total number of measurement signals is 12, as each displacement gauge records both horizontal and vertical displacements.

3.2

Numerical model

The numerical model was developed using the FEM software ZSoil, assuming plane strain conditions. The discretization is shown in Figure 3. The model consisted of 4,765 quadrilateral enhanced assumed strain elements [39] and 12 triangular elements. Horizontal displacements were restrained on the vertical edges, while both vertical and horizontal displacements were blocked on the bottom edge. The initial state was determined only for the subsoil, which comprised three layers: 1A, 2B, and 3C, assuming a groundwater level at Y = 115 m. In this step, the analysis considered stationary groundwater flow and one-sided poro-mechanical coupling, i.e. the pressure distribution influenced the stress state. In subsequent stages, the construction of the dam was simulated by adding additional layers of elements, which were divided into three groups: dykes, beaches, and sediments, as proposed by Saad and Mitri [40]. A time-dependent pressure boundary condition, corresponding to the hydrostatic pressure distribution, was applied at the left edge of the model. Pressure boundary conditions were also applied to the bottom and top nodes of each newly added sediment layer. The values of these boundary conditions varied from the time the layer appeared until the next layer above was added, ranging from 0 to hγ and from −hγ to 0 for the bottom and top nodes, respectively, where h is the height of the layer and γ is the unit weight of water. Negative pressure indicates suction. During this period, the dry unit weights of dykes, beaches, and sediments increased from 0 to the target values presented in Table 2. This part of the analysis was carried out assuming full poro-mechanical coupling (i.e. the pressure distribution and stress state influenced each other), Darcy’s law, and the van Genuchten model [42]. An added layer height of 5 m, a rise rate of 1.25 m per year, a calculation time step of 0.25 years, and an analysis period from 1980 to 2020 were adopted.

Table 2

Material parameters used to generate synthetic monitoring data.

Material	Case	E (MPa)	ν (−)	k _x (m/s)	k _y (m/s)	P _c0 (kPa)	λ (−)	R (−)	c (kPa)	φ (°)	ME (−)	γ (kN/m³)	n (−)	S _r (−)	α (1/m)	OCR (−)
1A	1.38	56.4	0.36	6.31 × 10⁻¹⁰	3.71 × 10⁻⁹	654.78	0.07	1.46	12.39	16.46	6.7	17.3	0.32	0	0.1	1.4
	1.48	55.1	0.27	1.82 × 10⁻⁹	1.44 × 10⁻⁹	476.47	0.06	1.28	11.05	17.91	11.8
	1.64	57.76	0.34	8.51 × 10⁻⁹	6.92 × 10⁻⁹	459.17	0.10	1.71	13	19.57	12
2B	1.38	72.52	0.23	1.99 × 10⁻⁸	5.37 × 10⁻⁹	640.8	0.13	1.81	15.4	20.68	—	17.7	0.33	0	0.5	1.34
	1.48	75.57	0.34	5.75 × 10⁻⁸	2.34 × 10⁻⁸	399.58	0.11	1.21	18.3	22.84
	1.64	90.39	0.27	2.69 × 10⁻⁷	1.99 × 10⁻⁹	565.64	0.09	1.23	18.45	20.47
3C	1.38	175.78	0.21	3.63 × 10⁻⁶	3.63 × 10⁻⁶	—	—	—	9.81	32.95	—	16.9	0.36	0	5	—
	1.48	109.23	0.28	2.24 × 10⁻⁶	2.24 × 10⁻⁶				6.81	35.5
	1.64	173.06	0.2	4.27 × 10⁻⁶	4.27 × 10⁻⁶				8.1	34.68
Dykes	1.38	117.8	0.23	6.61 × 10⁻⁵	2.09 × 10⁻⁴	—	—	—	20	34	—	19	0.15	0	10	—
	1.48	107.74	0.28	9.12 × 10⁻⁵	2.88 × 10⁻⁴
	1.64	114.12	0.3	2.23 × 10⁻⁵	7.08 × 10⁻⁵
Beaches	1.38	57.32	0.31	6.61 × 10⁻⁶	2.09 × 10⁻⁵	—	—	—	10	20	—	17	0.15	0	1.9	—
	1.48	41.39	0.33	9.12 × 10⁻⁶	2.88 × 10⁻⁵
	1.64	45.84	0.24	2.23 × 10⁻⁶	7.08 × 10⁻⁶
Sediments	1.38	20.86	0.32	1.44 × 10⁻⁹	1.44 × 10⁻⁹	—	—	—	10	20	—	17	0.15	0	0.75	—
	1.48	21.49	0.4	1.95 × 10⁻⁸	1.95 × 10⁻⁸
	1.64	28.62	0.28	1.35 × 10⁻⁸	1.35 × 10⁻⁸

Note: E – Young’s Modulus; ν – Poisson’s ratio; k _x, k _y – horizontal and vertical hydraulic conductivity; ME – Young’s Modulus (E) multiplier at depth Y = −50 m; γ – dry unit weight; n – porosity; S _r, α – van Genuchten parameters; OCR – initial state parameter for cap model [41]; c – cohesion; φ – angle of internal friction.

Source: Author’s contribution.

Considering the origin and classification of the soils, it was determined that the subsoil layers 1A and 2B should be modelled using the Cap model. The Cap model incorporates two yield surfaces: a Drucker–Prager yield criterion and an elliptical cap, as illustrated in Figure 4.

The cap yield surface is analogous to the Modified Cam-Clay model [43,44], developed to characterize the behaviour of cohesive soils, particularly clays. This surface constrains near-hydrostatic stress states, enabling the model to capture irreversible volumetric compaction – a key feature that makes it especially suitable for modelling clays and silty clays [45]. It is defined by two principal parameters: the preconsolidation pressure P _c0, which determines the intersection with the hydrostatic compression axis, and the shape parameter R, which characterizes the geometry of the elliptical cap. The preconsolidation pressure P _c0 represents the maximum historical isotropic effective stress experienced by the soil. The parameter R generally varies between 1.1 and 2.7, where the lower bound corresponds to a cap shape approaching a planar surface, and the upper bound corresponds to a more elliptical configuration. The evolution of the cap surface, commonly referred to as volumetric hardening, is governed by the parameter λ, which can be physically interpreted as the slope of the primary consolidation line in a semi-logarithmic e−log(p) space. Given the physical significance of P _c0 and λ, it is recommended that their values be obtained from laboratory tests [46]. The Drucker–Prager parameters are calculated automatically by ZSoil on the basis of angle of friction and cohesion, matching collapse load in plane strain with Mohr–Coulomb model for the same parameters [41]. When determining FOS using the strength reduction method, the cap surface is deactivated. Layer 1A was modelled with depth-dependent stiffness because of its substantial thickness. For elevations above Y = 110 m, Young’s modulus remains constant at the value shown in Table 2. Below this depth, it increases linearly achieving value multiplied by ME at Y = −50 m. The remaining materials were modelled using the Coulomb–Mohr model with the assumption of a non-associative plastic flow rule (including layers 1A and 2B) described by a dilatancy angle dependent on the angle of internal friction according to ψ = max(0.1φ, φ−25°).

3.3

Monitoring data

As stated in the introduction, the use of synthetic (i.e. numerically generated) data is essential to verify whether a good fit to the monitoring data corresponds to an accurate estimation of the FOS values. Accordingly, three distinct parameter sets were selected to generate synthetic monitoring data. The parameter sets were selected to yield FOS values of 1.38, 1.48, and 1.64 (hereafter referred to as the true FOS values), corresponding respectively to a potential failure condition, a limit state, and a safe state. When effective strength parameters are employed in accordance with Polish legal regulations [18], the limit state corresponds to an FOS value of 1.5. The parameters for the three cases are presented in Table 2.

In Figure 3, the locations of the nodes used for data generation are shown. Three nodes were selected as surface displacement gauges (U1, U2, and U3), and six nodes were used to mimic piezometers (from P1 to P6). Illustrative examples of numerically generated monitoring data are presented in Figures 5 and 6.

4

Results

4.1

Genetic algorithm setup

For each case, corresponding to a specific FOS value, the genetic algorithm calibration was performed three times. As previously described, the initial population consisted of 24 individuals, and at each iteration, the 12 individuals exhibiting the highest unfitness values were replaced with 12 newly generated individuals. The algorithm was terminated after 50 generations. The parameters being calibrated, along with their permissible ranges (i.e., the minimum and maximum values allowed during the genetic algorithm), are presented in Table 3. The boundary values were adopted based on experience from real-world applications; however, since a synthetic case is considered, the critical point is that the parameter values used to generate the monitoring data lie within the specified ranges. Parameters not subjected to calibration remained unchanged, with their values adopted as listed in Table 2. It should be noted that the hydraulic conductivity of the dykes was defined as dependent on the hydraulic conductivity of the beaches, according to a specified relationship (k _Dykes = 10k _Beaches) applied in both the horizontal and vertical directions. Additionally, the dilatancy angle was considered to be dependent on the angle of internal friction, as previously described.

Table 3

Permissible ranges of parameters being calibrated.

Material	Parameter	Min	Max
1A	E (MPa)	15.0	60.0
	ν (−)	0.20	0.40
	k _x (m/s)	1 × 10⁻¹¹	1 × 10⁻⁸
	k _y (m/s)	1 × 10⁻¹¹	1 × 10⁻⁸
	P _c0 (kPa)	300	700
	λ (−)	0.05	0.15
	R (−)	1.2	2.0
	c (kPa)	10	15
	φ (°)	15	20
	ME (−)	5	13
2B	E (MPa)	60	100
	ν (−)	0.20	0.40
	k _x (m/s)	1 × 10⁻⁹	1 × 10⁻⁶
	k _y (m/s)	1 × 10⁻⁹	1 × 10⁻⁶
	P _c0 (kPa)	300	700
	λ (−)	0.05	0.15
	R (−)	1.2	2.0
	c (kPa)	15	20
	φ (°)	20	25
3C	E (MPa)	100	200
	ν (−)	0.20	0.35
	k _x = k _y (m/s)	1 × 10⁻⁶	1 × 10⁻⁴
	c (kPa)	3	10
	φ (°)	26	36
Dykes	E (MPa)	70	120
Dykes	ν (−)	0.20	0.35
Beaches	E (MPa)	30	70
	ν (−)	0.2	0.4
	k _x (m/s)	1 × 10⁻⁷	1 × 10⁻⁵
	k _y (m/s)	1 × 10⁻⁷	1 × 10⁻⁵
Sediments	E (MPa)	10	30
	ν (−)	0.2	0.4
	k _x = k _y (m/s)	1 × 10⁻⁹	1 × 10⁻⁶

Source: Author’s contribution.

Displacement values in the objective (unfitness) function are expressed in millimetres (mm), while piezometric heads are given in metres above sea level (m asl). A weighting factor of 10 was assigned to the piezometric measurements, based on empirical judgement derived from trial-and-error procedures and prior experience with real-world applications. Consequently, units for the unfitness metric are omitted in the following sections, as this quantity does not possess a direct physical meaning

4.2

Results of parameter identification

Figure 5 shows how the unfitness of the best individual and the average of the six best individuals changed over successive iterations. Three cases out of nine were selected to illustrate both the best and worst calibration performance. In these cases, the value for the best individual stabilized before the 20th iteration, while the average value for the six best individuals continued to decrease after the 40th iteration.

Examples of the fit to synthetic monitoring data for various unfitness values in the case of FOS = 1.64 are presented in Figures 6 and 7. It is noteworthy that discernible differences are observed between unfitness values of 7.0 and 11.8, while the differences between 7.0 and 26.8 are substantial. Therefore, the fit corresponding to an unfitness value of 7.0 can be considered accurate, whereas unfitness values of 45.2 and 138 should be regarded as poor and inadequate, respectively.

4.3

Data analysis and FOS estimation

First, the relationships between the FOS values and, respectively, hydraulic conductivities, angle of internal friction of the dominant layer 1A, and the value of the objective function were examined, as illustrated in Figure 8. The results from the three calibrations for each case were considered collectively. Blue dots indicate non-repeating individuals from the genetic algorithm calibrations, while the red dot represents the true values of the FOS and the relevant parameter. The findings indicate a moderate positive correlation (Pearson coefficient ≈ 0.5) between vertical hydraulic conductivity and FOS values. It is evident that higher FOS values are attainable only with higher vertical hydraulic conductivity. For higher values of the angle of internal friction, a similar, though less pronounced, relationship is observed. The FOS value is weakly correlated with horizontal hydraulic conductivity and unfitness. Notably, poor fits, defined as those with unfitness above 150, are predominantly associated with low FOS values (below 1.20), whereas good fits typically exhibit FOS values above 1.15. Most importantly, if only one of the three selected parameters is correctly reproduced (while the other two vary freely), the resulting FOS exhibits significant dispersion. This demonstrates that correct identification of a single parameter is insufficient. Consequently, FOS values were subsequently investigated under the assumption that the selected parameters were correctly identified. Values within the range of φ _true ± 0.5° (with φ _true denoting the angle of internal friction of 1A, as provided in Table 2) were considered to represent a correctly identified angle of internal friction, while for hydraulic conductivity, the ranges were assumed to be symmetrical with respect to the true value on a logarithmic scale, as illustrated in Figure 9.

All cases with accurately identified hydraulic conductivity and angle of internal friction are presented in Figure 10. It can be observed that the number of such cases remains limited: five for case 1.38, three for case 1.48, and ten for case 1.64. The figure illustrates the relationship between unfitness and FOS. The wide range of unfitness values is expected, as displacement results contribute to unfitness and other parameters, primarily Young’s modulus, play a significant role. Importantly, the occurrence of inaccurately reproduced FOS values suggests that parameter identification is insufficient. The maximum differences between the true and calculated FOS values are 0.18, 0.48, and 0.60 for cases 1.38, 1.48, and 1.64, respectively.

In view of the findings and the aforementioned influence of the pore pressure distribution on the calculated FOS values, it was decided to investigate whether the accurate representation of piezometric measurements increases the likelihood of obtaining precise FOS values. Figures 11–13 illustrate the relationship between the FOS values and the RMSE calculated solely from piezometer data. In all three cases, the Pearson correlation coefficient was approximately below −0.70, indicating a strong negative correlation. A linear regression was used to formulate a simple model, and, assuming an RMSE of zero, meaning a perfect fit, the FOS values were estimated. Notably, FOS values calculated in this manner were close to 1.50 in each case. This outcome can be attributed to two principal reasons. First, the use of the same numerical model and identical parameter ranges in each of the three cases has a considerable impact. Second, the differences between the piezometric measurements for the three cases are not substantial, with average RMSEs around 5.0 (for comparisons of cases 1.64 vs 1.48 and 1.64 vs 1.38), or even 2.0 (for case 1.48 vs 1.38). Consequently, a satisfactory fit in one case is also likely to be satisfactory in the others. When calculated solely from piezometric data, the RMSE can be interpreted as the mean difference in metres per observation. Restricting the set to cases with a correctly identified angle of internal friction made it more likely to achieve accurate FOS values. Although the Pearson coefficients increased in these cases, the correlation should still be interpreted as previously discussed. Importantly, the estimated FOS values remained skewed towards 1.50.

It is important to consider the underlying reasons for the observed linear correlation. Assuming that lower pore pressure values correspond to higher FOS, it follows that cases characterized by low FOS and high RMSE represent situations where the numerical predictions substantially exceed the measured (or synthetic) piezometric data. Intuitively, low RMSE values should be associated with FOS estimates that closely approximate the true value. Therefore, it is plausible that Figures 11–13 exhibit a converging relationship between FOS and RMSE. However, there are few, if any, cases with both high RMSE and high FOS values exceeding the true value. This finding may be attributed to the imposed boundary conditions, particularly those concerning pressure, which are designed to simulate sediment deposition within the reservoir and the water level at the right boundary of the model. Given these boundary conditions and the dam’s geometry, achieving low pressure distributions is not feasible. Consequently, it can be concluded that the numerical model was appropriately constructed for the considered monitoring data.

The accuracy of FOS estimation was analysed by excluding the best individuals from the development of the linear regression model. Figure 14 presents plots of the FOS values obtained from the linear regression model, drawn exclusively for individuals with errors exceeding the values shown on the horizontal axis. The analysis focused on the case with a true FOS of 1.64 and considered two error metrics RMSE calculated solely from piezometric data, and unfitness. Three separate calibrations were distinguished, and their combined results were also shown. It is evident that in this case, the estimated FOS remains stable and close to the true value when the RMSE for piezometric data is below 5 or unfitness is below 25. The latter value corresponds to the best individual from the initial population in the worst case. When the results of all three calibrations are combined, a lower variance in FOS values is observed, with abrupt drops or increases eliminated. This confirms that multiple runs of the genetic algorithm can be advantageous, as different near-optimal solutions can be obtained using the same heuristic.

5

Conclusion

This study should be regarded as an introduction to development of a method for estimating FOS of tailings dams based on monitoring data and numerical modelling. The analysis of genetic algorithm calibration conducted here demonstrates that achieving an apparently good fit to historical measurements, specifically displacements and piezometric heads, does not, by itself, ensure accurate estimation of FOS values.

Furthermore, the analysis included a counterexample showing that even the correct identification of both hydraulic conductivity and the angle of internal friction is insufficient on its own. This finding underscores the limitations of relying solely on these key geotechnical parameters when assessing dam safety. A strong correlation was observed between the obtained FOS and the error (quantified as RMSE) between numerical results and piezometric monitoring data. Building on this, a simple linear regression model was proposed for FOS estimation. For the adopted numerical model, parameter ranges, and configuration of sensors in this study, accurate FOS estimation was possible only when both piezometric head measurements and the angle of internal friction were correctly reproduced.

The results indicate that a comprehensive approach is essential for reliable FOS prediction: not only must the numerical model accurately fit the monitoring data, but the core material properties, including hydraulic conductivity and angle of internal friction, must also correspond closely to actual field conditions. The findings support the use of regression-based meta-models, conditioned on well-calibrated monitoring data and robust parameter identification, as a pragmatic tool for dam safety assessment. The proposed approach should now be validated through extensive real-world applications, encompassing different dam types, geological settings, and operational conditions, to assess its generalizability, refine its predictive capability, and establish its practical value for engineering decision-making.

Acknowledgements

The author wishes to thank the research team members from Wrocław University of Science and Technology involved in the long-standing collaboration with KGHM Polska Miedź S.A. on the Żelazny Most tailings facility analyses, particularly Dariusz Łydżba, Adrian Różański, Maciej Sobótka, and Michał Pachnicz, for their invaluable contributions and support.

Funding information

Author states no funding involved.

Author contributions

Szczepan Grosel performed all aspects of this work independently, including conceptualization, methodology development, software implementation, validation, formal analysis, original draft writing, and visualization.

Conflict of interest statement

Author states no conflict of interest.

Estimation of the stability of tailings dams based on genetic algorithm-supported calibration results and monitoring data

Full Article

Paradigm

My account