Bridges are vital structures that form the backbone of transportation systems. Their structural integrity is not just a matter of engineering but a critical safeguard against catastrophic failures that can claim lives, disrupt economies, and harm ecosystems. A significant reminder of this risk is the 2018 collapse of the Morandi Bridge in Genoa, Italy, which killed 43 people and exposed the severe consequences of insufficient maintenance and outdated monitoring practices [1]. Another example is the collapse of the Carola bridge, one of the most essential Elbe bridges in Dresden, Germany, on September 11, 2024, where 100 m of the footpath and rails collapsed and ended up in the Elbe. Fortunately, there were no injuries [2]. Such tragedies underscore the need for advanced structural health monitoring systems that can detect early signs of deterioration and prevent disasters before they occur. Conventional SHM methods, such as visual inspections and periodic sensor checks, have long been the industry standard. However, these approaches are labour-intensive, time-consuming, and often fail to provide real-time or predictive insights into structural behaviour [3]. Farrar and Worden [4] note that traditional structural health monitoring techniques can also struggle to detect localised damage caused by temperature fluctuations, corrosion, and dynamic loads from traffic and wind. In addition, Fawad et al. [5] demonstrate that the traditional reliance on manual inspections and periodic sensor checks often leads to delayed and fragmented data acquisition, underscoring the need for automated and integrated monitoring systems.
Grieves [6] introduced the digital twin concept, a dynamic, virtual replica of a physical system that can facilitate real-time simulation and infrastructure monitoring under varying conditions. In recent work, Kapteyn et al. [7] have demonstrated that integrating digital twin models with machine learning enhanced the predictive accuracy of structural health monitoring systems and deepened the understanding of stress distributions and failure mechanisms. Furthermore, research by Ye et al. [8] and Abdeljaber et al. [9] provided evidence that artificial neural networks (ANNs) can automatically learn complex, nonlinear relationships from sensor data, for instance, correlating strain measurements with key performance indicators such as deflection, thereby enabling a proactive maintenance approach that outperformed conventional methods.
In this study, we presented a novel framework that integrated artificial intelligence (AI) with the digital twin concept to enhance the accuracy and reliability of SHM. We validated our approach through a real-world case study on Hungary’s Soroksári Bridge.
This research proposes an advanced methodology that combines AI and finite element modelling for a digital twin framework to predict structural behaviour and assess the condition of reinforced concrete bridges. This framework relies on three main components. First, a strategically installed network of sensors on the bridge to monitor key structural parameters at selected critical locations continuously. These measurements were used in later stages to predict the condition of the bridge.
Second, a detailed 3D finite element (FE) model of an existing bridge, incorporating precise design specifications, actual environmental conditions like ambient temperature, and realistic traffic loads. To ensure the accuracy of the digital twin framework, the FE model was verified and validated through extensive loading tests on the real bridge. Therefore, this validated FE model assisted the digital twin in analysing the bridge’s performance under several loading scenarios generated by Monte Carlo simulation, using real traffic data-based stochastic parameters, and extracting virtual sensor data, to simulate the behaviour and measurements on the real bridge. This data was collected at locations corresponding to monitoring points on the real bridge with the same structural feature types i.e. longitudinal displacement and strains, and other critical locations defined by extreme statical results with more structural feature types.
Third, a powerful AI tool called an artificial neural network, particularly a feedforward neural network (FNN) type, to analyse the relationship between the virtual sensor data and the structural information, such as deflections, stresses and crack widths in the critical points, calculated by the FE model. By learning the complex patterns within these data sets, the trained FNN model was able to predict the structural behaviour of the physical bridge when fed with real sensor measurements.
The previous methodology was applied to the Soroksári Bridge in Hungary, to demonstrate its ability to monitor structural health proactively. Combining the digital twin concept with machine learning offers a reliable, data-based approach to extend the lifespan of key infrastructure.
Recent developments in digital twin technology with artificial intelligence have significantly enhanced the structural health monitoring of bridges. This review highlights critical studies in this field, reviewing their contributions, methodologies, and limitations. Hielscher et al. [10] developed a neural networkbased digital twin model for structural health monitoring of reinforced concrete bridges. Their methods involved creating a detailed 3D finite element model of the bridge, generating training data from this model under various loading conditions, and employing a feed-forward neural network called multi-layer perceptron (MLP) to predict strains. While innovative, this study was limited to predicting only strain responses and was applied to a prototype and not a real bridge structure. Parola et al. [11] demonstrated deep learning and IoT-enabled digital twins for structural damage localisation. Their approach included deploying IoT sensors for real-time data collection and implementing a deep learning model, particularly a convolutional neural network, for damage detection and localisation. While this approach was practical, it did not predict various structural features. Wang et al. [12] introduced a digital twin modelling approach for structural strength monitoring using transfer learning-based multi-source data fusion. Their method used transfer learning techniques to modify models across domains, integrating data from multiple sources such as sensors, historical data, and simulations. They employed deep neural networks (DNN) and optimised the hyperparameters of the DNN using Bayesian Optimization (BO) for feature extraction and prediction. This approach showed improved performance compared to traditional methods. Still, it did not specifically address the challenges of bridge monitoring, so it was applied to a rectangular plate with a hole under axial tension. Yu et al. [13] proposed a digital twin approach using nonparametric Bayesian networks for complex system health monitoring. Their method focused on the digital twin's uncertainty quantification and real-time updating, employing probabilistic analysis for health state estimation. This approach improved monitoring accuracy and adaptability to changing conditions, demonstrating flexibility in modelling complex systems. However, it lacked specificity to bridge structures where their methodology was applied to complex industrial machinery used in manufacturing processes. Malekloo et al. [14] provided a comprehensive overview of machine learning applications in SHM. Their review highlighted various techniques, including supervised learning methods (e.g., support vector machines, random forests), unsupervised learning approaches (e.g., clustering algorithms), and deep learning architectures (e.g., convolutional neural networks, recurrent neural networks, autoencoders). While emphasising the potential of advanced data analysis techniques in SHM, their review did not offer a specific implementation strategy for bridge monitoring. Liu et al. [15] applied a response surface-corrected finite element model and Bayesian neural networks to predict bridge responses under strong winds. Their methodology involved developing a finite element model of the bridge, creating a response surface model to approximate FE predictions, and implementing Bayesian neural networks for probabilistic prediction of bridge responses. This study focused on extreme weather conditions but did not address a broader range of loading scenarios. Armijo and Zamora-Sánchez [16] presented a case study on Internet of Things (IoT) applications with digital twins, offering practical insights into real-world implementation. Their approach included implementing IoT sensors for real-time data collection, creating a digital twin model of the system under study, and using data analytics and machine learning algorithms to process and interpret the collected data. However, their study was not specifically tailored to bridge structures.
The existing literature, while valuable, revealed several gaps in the field of bridge SHM: limited integration of comprehensive digital twin models with real-time sensor data, lack of focus on predicting multiple structural features simultaneously, and insufficient validation through real-world bridge structure. The current study addressed these gaps by introducing an innovative methodology that combined a detailed, validated 3D finite element model with real-time sensor data from the physical bridge. It employed an artificial neural network to predict multiple structural features, including stresses, deflections, and crack widths. This research has the potential to significantly enhance the accuracy and comprehensiveness of bridge health monitoring, ultimately improving safety and maintenance strategies in bridge infrastructure.
To test the proposed methodology of integrating an AI algorithm with a digital twin concept for damage identification and prediction of bridges, it was applied to a real reinforced concrete box girder bridge. Soroksári bridge was constructed in 1990, spanning a branch of the Danube River in Hungary as part of the M0 motorway. This bridge had a unique superstructure design containing two variable-height reinforced concrete box girders with integrated side cantilever plates to support the pavement. The total cross-sectional width of the bridge was 21.94 meters, and the specific section under investigation had a length of 148.55 meters. A detailed side view and cross-section of the bridge are provided in Fig. 1. The original monitoring system on the Soroksári bridge consisted of 8 sensors measuring temperature and 10 sensors measuring longitudinal elongation in different cross-sections. However, for this research, in order to acquire more comprehensive data regarding the superstructure’s behaviour, seven additional strain gauges were also installed within one of the box sections with the help of the Hungarian Public Roads Company as the bridge operator and the Structural Laboratory of Budapest University of Technology and Economics, Faculty of Civil Engineering. More details about the types and locations of the original and the additional monitoring devices, as well as about the bridge itself are provided in Asseel et al. [17].

Side view and cross-section of Soroksári bridge
The first step in creating a digital twin of the Soroksári Bridge involved developing a 3D finite element model in AxisVM FE software. The modelling was done in a detailed way based on the original structural implementation plans, including the geometry, material properties, support conditions, other structural components and post-tensioning cables. To ensure the accuracy of the developed FE model, it was numerically verified and validated against loadtesting data conducted by the Budapest University of Technology and Economics in 1990 [18]. The steps and details of FE model development and validation can be found more detailed in Asseel et al. [19]. The trustworthiness of the FE model was confirmed by achieving a 5% difference between the simulated deflections from the FE model and the actual deflections measured during load testing.
To collect the datasets that served as a basis for training the artificial neural network in order to learn about the performance of the bridge, different loading scenarios were generated using Monte Carlo simulation with importance sampling. Stochastic parameters (such as mean value and standard deviation) of the traffic loads were determined by measurements on another similar Hungarian road bridge. This work was part of a study conducted by the Department of Structural Engineering at the Faculty of Civil Engineering, Budapest University of Technology and Economics [20]. The process of generating the load scenarios and applying them to the FE model was automated by a code script executed in Python programming environment, with the use of Pyaxisvm connection that coupled Python with AxisVM software. It allowed automatic application of the generated load scenarios to the FE model, numerical analysis of the bridge for each load scenario and collecting structural results, such as longitudinal and transverse displacements, deflections, crack widths at corresponding top and bottom extreme fibres, and axial stresses in local longitudinal direction and local vertical direction, for altogether 1200 analysis runs.
The cornerstone of successful neural network training is effective data preprocessing, as the network's ability to learn hinges on the quality and consistency of the input data. Data preprocessing involved several critical steps, such as cleaning, filtering, transformation, and normalisation, ensuring that the input data was in a suitable format for model training [21]. In this study, datasets were imported from MS Excel files, where they were stored previously, into Pandas DataFrames, a table-like data structure in Python. Data was filtered to include only readings from sensor locations and critical nodes identified in the FE model. Strains were computed from stresses using Young’s modulus (assuming linear elastic behaviour). Displacements and strains on the longitudinal direction were selected as input variables to be used later for prediction, and those features were also measured from the real sensors. While crack widths, stresses, strain on the transverse direction and deflections, served as target variables and structural performance indicators for serviceability limit state damages if we set crack width, stress and deflection limits, as it can be reached much earlier than the ultimate limit state damages. A crucial step was handling the zero values in crack width data sets by replacing zero entries with the mean of non-zero values for each feature. This approach prevented the network from being skewed by zero values. Finally, all features were normalised using the “MinMaxScaler” function from the scikit-learn, a Python library for machine learning. This scaling technique transformed all features to a range between 0 and 1, ensuring that features with larger magnitudes did not dominate the learning process [21]. Once the data was fully processed, it was split into training, validation, and test sets with a ratio of 60:20:20. This comprehensive preprocessing pipeline created a robust learning environment for the neural network, ensuring it could generalise well and accurately predict the bridge’s structural performance.
Neural networks became a powerful tool in structural engineering because they can model complex, nonlinear relationships between certain structural parameters without the need for explicit mathematical formulations of the underlying physical processes. This makes them practical for using monitoring data to predict bridge structural features such as deflection, crack width, and stresses, which are affected by numerous factors including material properties, geometry, loading conditions, and environmental influences. For example, neural networks implemented in MATLAB have accurately predicted beam deflection behaviour [22], and they have been used for real-time crack detection and localisation with minimal strain gauge sensors [23]. Specific types of neural networks, such as deep learning and transfer learning, have further improved their computational efficiency and predictive accuracy in structural analysis and monitoring [24]. Deep learning enables the automatic extraction of hierarchical features and robust modelling of complex structural data, as demonstrated in research works by LeCun et al. [25] and Schmidhuber [26]. Meanwhile, transfer learning allows the adaptation of pre-trained models to new structural monitoring tasks with limited data, reducing training time and cost. Detailed surveys by Pan and Yang [27]and Zhuang et al. [28] highlight how these transfer learning strategies can improve damage detection and overall predictive performance in engineering applications.
Among the different types of neural networks, feedforward neural networks (FNNs) are specifically fitted for tasks in structural engineering. Their simple architecture, direct data flow from input to output, allows them to effectively approximate complex functions and capture nonlinear relationships among input parameters. Studies have demonstrated FNNs’ capability in modelling the load-deflection behaviour of concrete slabs, providing accurate predictions of structural responses under different loading conditions [29], and in predicting crack propagation in concrete structures [30]. Consequently, FNNs offer a practical solution for predicting the structural performance of reinforced concrete bridges.
This study proposed a feedforward neural network system, to predict multiple structural features, based on available monitoring data for the given bridge structure. It consisted of an input layer, three hidden layers, and one output layer. The size of the input layer was determined by the number of selected features, ensuring that the network efficiently captured the complex relationships inherent in the data. The characteristics of the trained ANN are illustrated in the following Table 1.
Characteristics of the feedforward neural network
| Category | Details |
|---|---|
| Input Features | Displacement and strains on longitudinal direction. |
| Target Outputs | Crack width data (top and bottom extreme fibres), strains (on the transverse direction), stresses (on the transverse and longitudinal directions), and displacements (on the transverse and vertical directions). |
| Preprocessing | – Zero values replaced with non-zero means or 1e-3. |
| Data Split | – Training: 60% |
| Network Architecture | – Input Layer: the same size as input features. |
| Optimization | – Optimizer: Adam. |
| Loss Function | Mean Squared Error (MSE). |
| Metrics | Mean Absolute Error (MAE). |
| Training Parameters | – Batch Size: 32. |
| Regularization | – L2 Regularization (1e-4). |
| Evaluation | Final test performance: MSE and MAE metrics. |
The size of the training dataset is critical for the accuracy and generalization of neural network models in structural health monitoring. Sufficient data was essential when predicting complex parameters such as deflection, crack width, and stresses. However, selecting an optimal sample size required balancing data availability, computational resources, and the model’s generalization ability. This study evaluated neural network performance using sample sizes of 200, 500, 700, 1000, and 1200. These subsets were drawn from simulated data corresponding to sensor locations (virtual sensors) and calculated structural features (deflections, crack widths, stresses) in critical structural points identified through FEM analysis. The network was designed to predict the mentioned structural features, based on the determined virtual sensor data.
Our work first aimed to understand how training dataset size affected learning and prediction accuracy. It was apparent that as sample size increased, MSE (mean square error) and MAE (mean absolute error) decreased considerably; consequently, the accuracy of a feedforward neural network model generally improved with increasing sample size (Fig. 2) as more data helped the model generalize better and reduce overfitting (i.e., when the model captures both underlying patterns and noise, limiting its performance on new data) by providing a more comprehensive representation of the underlying data distribution. However, in this case, accuracy slightly decreased when the sample size exceeded 1000, potentially due to the model’s limited capacity to capture the added complexity. This suggested that beyond 1000 samples, the FNN model complexity by adding more layers and neurons or increasing the number of epochs was necessary to utilize the larger dataset effectively. Moreover, R2 (the coefficient of determination) rose closer to 1.0 with larger sample sizes, indicating a tighter fit between the model’s predictions and the actual data. The optimal sample size for this specific feedforward neural network model was 1000 samples, which provided a balance between accuracy and computational efficiency and optimized the model’s performance.

Effect of sample size on FNN performance metric: mean square error (MSE), mean absolute error (MAE) and coefficient of determination (R2)
Three model versions were developed using simulated data to evaluate how the number of data collection points affected our neural network’s performance. The first version used data from 17 locations corresponding to actual sensor placements. The second version extended this to 25 locations by including additional critical points identified from the FEM analysis, while the third version further increased the number to 40 locations. The number of data collection points directly determined the dimensionality of the input and target matrices. For example, when training the model with 17 locations, each feature in the input and target matrices contained 17 variables. Accordingly, the input matrix (X) included 17 variables for longitudinal displacements and 17 variables for longitudinal strains. Similarly, the target matrix (Y) contained 17 variables for each corresponding feature. As more locations were added, the matrices became higherdimensional, increasing the model's complexity.
Figure 3 illustrates the training and validation loss curves for each model. Training loss measures how well the model fits the data it was trained on, while validation loss evaluates the model’s performance on unseen data, serving as an indicator of generalization. All versions showed a quick decrease in both losses during the initial epochs before stabilizing. The model with 17 locations achieved the lowest final loss values, indicating a well-balanced fit with minimal overfitting. In contrast, models with 25 and 40 locations maintained low losses but exhibited slightly higher residual errors, reflecting the increased complexity in capturing more detailed structural behaviour.

Training and validation loss for the ANN model for; 17 locations (A), 25 locations (B), and 40 locations (C)
Figure 4 presents scatter plots comparing predicted versus actual values (coming from the FE model and used to train the FNN), for FNN models using 17, 25, and 40 data collection points. All plots show a strong diagonal alignment, indicating high predictive accuracy. The 17-location model has the tightest clustering, while the 25-location model shows slightly larger deviations. Though all models maintain robust predictive performance, the 40-location model exhibits a broader spread at the extremes, suggesting increased challenges with higher dimensionality.

Predicted vs. and actual values for the FNN model for; 17 locations (A), 25 locations (B), and 40 locations (C)
Figure 5 compares performance metrics for models built using 17, 25, and 40 data collection points. The error metrics MSE, RMSE (root mean squared error), and MAE increase as more data points are added. For example, the 17-location model achieved MSE = 0.048, RMSE = 0.219, and MAE = 0.062, while the 25-location model showed MSE = 0.093, RMSE = 0.305, and MAE = 0.080, and the 40-location model reached MSE = 0.135, RMSE = 0.367, and MAE = 0.077. In parallel, the R2 values, which indicate the proportion of variance explained, slightly decrease from 0.9988 (17 locations) to 0.9980 (25 locations) and 0.9977 (40 locations). This trend highlights a trade-off between incorporating more data and maintaining prediction accuracy.

Normalised metrics comparison across model versions
Similarly, Fig. 6 illustrates the performance versus complexity trade-off by plotting RMSE and R2 values against the number of data points. The RMSE (blue line) increases with more locations, while the R2 value (green line) slightly decreases, reinforcing the observation that model complexity impacts performance. Despite these variations, the consistently high R2 values across all versions highlight the models’ overall effectiveness in predicting structural behaviour, emphasizing their potential for practical applications in structural health monitoring.

Performance vs. complexity trade-off
In summary, increasing the number of data collection locations enriched the dataset but also increased model complexity, which can negatively impact performance. While the 17-location model achieved the best predictive accuracy with minimal errors and the highest R2, the 25-location model offered a favourable balance. It maintained high performance while incorporating additional structural information, thereby effectively generalising structural behaviour. Consequently, the 25-location model was selected for further studies as it best balanced accuracy and complexity.
After training the FNN model on the data from the Digital Twin, it was used to make predictions when fed with sensor measurements from the physical bridge. But first, those readings had to be prepared accordingly, the workflow involved establishing baseline sensor readings and making necessary adjustments using the FEM model to compensate for the effects of the self-weight of the bridge, because the sensors were installed after the construction of the bridge (so real sensor measurements do not include the effect of self-weight, while virtual sensors in the FE model did include that). After the sensor readings were prepared to make predictions, they underwent preprocessing to ensure consistency and compatibility with the FNN model. The same MinMaxScaler from the sklearn library [31] was used when the Feedforward Neural Network model was trained, scaling all features between 0 and 1. The normalized dataset was subsequently used to predict crack widths, stresses, and deflections in critical structural points, based on real sensor measurements. Predicted values were scaled back to their original ranges using inverse transformations, ensuring interpretability in the context of physical measurements. The steps of the used code are illustrated in Fig. 7.

Code steps for forecasting structural features
Predictions were evaluated against predefined thresholds to assess structural damage. Deflection values exceeding the L/400 [32] deflection limit given in the Hungarian annex were categorized as damage due to excessive deflection, while crack widths above 0.2 mm [33] for prestressed structure members with bonded tendons for exposure classes XC1 (for dry or permanently wet environments), identified as damage caused by cracking. In the future work, concrete compressive stresses will also be included as a structural identifier. This approach enables continuous monitoring and identification of potential damage patterns.
For demonstration purposes, predictions were made using the pre-trained FNN algorithm and real sensor readings to test the FNN model. In this study, we used data collected over one hour on a summer day at 5:00 PM, coinciding with peak rush hour traffic. This specific time frame was selected to capture the bridge's response under high traffic loads, providing a representative assessment of its performance during conditions of maximum utilisation. The predictions were conducted for 25 specific locations on the bridge (Fig. 8) that represent the locations of the sensors and other critical locations presented in Table 2, each identified by a distinct node number extracted from the FE model.

Locations for making predictions on Soroksári bridge
Selected nodes
| Node | Type of sensor | Location |
|---|---|---|
| 183 | × | Bottom of the left box girder at support 15/1 |
| 355 | Displacement | at support 13/4 |
| 356 | Displacement | at support 13/3 |
| 357 | Displacement | at support 13/2 |
| 358 | Displacement | at support 13/1 |
| 360 | × | at support 14/3 |
| 361 | Displacement | at support 15/4 |
| 362 | Displacement | at support 15/3 |
| 363 | Displacement | at support 16/4 |
| 364 | Displacement | at support 16/3 |
| 365 | × | at support 14/2 |
| 368 | × | at support 15/1 |
| 369 | Displacement | at support 16/2 |
| 370 | Displacement | at support 16/1 |
| 375 | × | Bottom slab of the box girder near Support 14/3 |
| 377 | × | Bottom slab of the box girder near support 14/2 |
| 1817 | Strain | Bottom slab of the right-side box girder in the first quarter of the bridge |
| 3851 | × | Bottom of the transverse wall at the middle cross-section on the left-side box girder |
| 5151 | Strain | Bottom slab of the right-side box girder in the third quarter of the bridge |
| 7438 | × | Middle top of the transverse wall at Pier 14 |
| 8279 | Strain | Top slab in the right-side box girder near Pier 14 (right side) |
| 8438 | Strain | Right-side internal wall of the right box girder at the middle cross-section of the bridge |
| 9280 | Strain | Top slab of the right-side box girder at Pier 14 (left side) |
| 9287 | Strain | Left-side internal wall of the right box girder at the middle cross-section of the bridge |
| 9407 | Strain | Top slab of the right-side box girder in the second quarter of the bridge |
The calculated utilisations (limit value/predicted value × 100) at these locations were evaluated in serviceability limit state (crack width and deflection) based on the previously defined limit criteria to assess the structural performance. Based on the calculated utilisations shown in Fig. 9, the results indicated that the bridge was satisfactory for the serviceability limit state, demonstrating adequate performance under the currently acting conditions.

Top crack width utilisations (A), bottom crack width utilisations (B) and deflection utilisations (C).
The digital twin developed for the Soroksári Bridge showed high accuracy in predicting structural performance, though several limitations must be considered. Simplifications in the FE model, such as assumptions about material properties, boundary conditions, and loading scenarios, introduced an approximate 5% error, which was carried into the FNN training process. Additionally, while the ANN achieved low error metrics (indicating a 2–5% deviation), these uncertainties mean the digital twin was reliable within a 7–10% margin of error under normal conditions.
Our novel framework, which integrates the digital twin concept, FNN models, and real sensor data, offers a scalable and practical approach to structural health monitoring of reinforced concrete bridges. Simulated data were collected at critical sensor and structural points using Monte Carlo simulations to generate random loading scenarios. The ANN was then trained with inputs such as longitudinal displacements and strains to predict key performance indicators like stresses, crack widths, and deflections. With the FE model achieving around 95% accuracy and the ANN model reaching 95–98%, the overall digital twin accuracy ranged between 90–93%.
However, it is important to acknowledge that this study focused on a single case, the Soroksári bridge, which has a specific geometry, sensor layout, and material specifications. While the methodology has strong potential, its adaptability to other bridge types is not straightforward. Transferring the framework to structures with different configurations, materials, boundary conditions, or operational environments would require substantial reconfiguration, including the development of a new FE model and retraining the neural network with representative sensor data. Although the core principles remain applicable, future work will be needed to assess the method's generalizability and to identify the adjustments required for other use cases.
Overall, this integrated approach reduces reliance on costly manual inspections and provides a robust tool for real-time structural assessment. It lays a solid foundation for future advancements in SHM and infrastructure management.