An Automatic Damage Detection Method Based on Adaptive Theory-Assisted Reinforcement Learning

Chengwen Zhang , Qing Chun , Yijie Lin

Engineering ›› 2025, Vol. 50 ›› Issue (7) : 196 -211.

PDF (5969KB)
Engineering ›› 2025, Vol. 50 ›› Issue (7) :196 -211. DOI: 10.1016/j.eng.2025.03.026
Research
research-article
An Automatic Damage Detection Method Based on Adaptive Theory-Assisted Reinforcement Learning
Author information +
History +
PDF (5969KB)

Abstract

Current damage detection methods based on model updating and sensitivity Jacobian matrixes show a low convergence ratio and computational efficiency for online calculations. The aim of this paper is to construct a real-time automated damage detection method by developing a theory-assisted adaptive mutiagent twin delayed deep deterministic (TA2-MATD3) policy gradient algorithm. First, the theoretical framework of reinforcement-learning-driven damage detection is established. To address the disadvantages of traditional mutiagent twin delayed deep deterministic (MATD3) method, the theory-assisted mechanism and the adaptive experience playback mechanism are introduced. Moreover, a historical residential house built in 1889 was taken as an example, using its 12-month structural health monitoring data. TA2-MATD3 was compared with existing damage detection methods in terms of the convergence ratio, online computing efficiency, and damage detection accuracy. The results show that the computational efficiency of TA2-MATD3 is approximately 117–160 times that of the traditional methods. The convergence ratio of damage detection on the training set is approximately 97%, and that on the test set is in the range of 86.2%–91.9%. In addition, the main apparent damages found in the field survey were identified by TA2-MATD3. The results indicate that the proposed method can significantly improve the online computing efficiency and damage detection accuracy. This research can provide novel perspectives for the use of reinforcement learning methods to conduct damage detection in online structural health monitoring.

Graphical abstract

Keywords

Reinforcement learning / Theory-assisted / Damage detection / Newton’s method / Model updating / Architectural heritage

Cite this article

Download citation ▾
Chengwen Zhang, Qing Chun, Yijie Lin. An Automatic Damage Detection Method Based on Adaptive Theory-Assisted Reinforcement Learning. Engineering, 2025, 50(7): 196-211 DOI:10.1016/j.eng.2025.03.026

登录浏览全文

4963

注册一个新账户 忘记密码

1. Introduction

Structural damage detection is an important issue in civil engineering research [1], [2], [3]. Extensive research has been conducted on damage detection, such as apparent damage detection based on computer vision and deep learning methods [4,5], damage state assessment via nondestructive detection methods like ultrasound technology [6], and damage detection based on finite element model (FEM) updating. Currently, online structural health monitoring (SHM) systems generally adopt the FEM-based damage detection method. It has been scientifically proven that the FEM obtained using the element-type updating method is physically meaningful [7]. In particular, sensitivity-based updating methods are widely used in practical engineering [8,9], such as the Newton’s method [10]. Although the sensitivity-based FEM updating theory is relatively mature [11], it suffers from the two major limitations of high computational cost and ill-conditioned matrix solution. Li and Law [12] attempted to overcome the problem of the ill-conditioned matrix solution by adopting the adaptive Tikhonov regularization method based on the L-curve method. However, if the calculation dimension becomes very large or the condition number of the sensitivity matrix is inevitably large, the convergence of the FEM update is extremely sensitive to the initial adaptive coefficient in the L-curve method [13]. In addition, many researchers have effectively studied the updating indices [14], updating accuracy [15,16], sensitivity function weighting method [17], and updating convergence speed [18]. Nevertheless, in the current online SHM system, traditional FEM updating algorithms often face difficulties in determining universal or adaptive hyperparameters when analyzing a large number of monitoring data. This means there is a high likelihood that the FEM updating will fail to converge. Moreover, the computational efficiency of traditional FEM updating algorithms still does not meet the requirements of real-time analysis with large amounts of monitoring data.

In recent years, intelligent optimization algorithms [19] and artificial neural networks [20] have been widely used in the field of FEM updating. Using these methods, the online computing time during the operation stage can be converted into offline training time in the preparation stage, ensuring high computational efficiency for online analysis in SHM systems [21]. This means that an agent model is trained in the offline stage [22,23] to replace the sensitivity matrix solution, and then the agent model is applied during the online stage for testing and further learning. For example, Levin and Lieven [24] used radial basis neural networks to fit the nonlinear mapping relationship between feature quantities and FEM design parameters. However, due to the sparsity of sensor placements, the prior knowledge that can be learned is always limited. Therefore, it is still difficult to solve the convergence problem mentioned above by directly using the traditional supervised learning architecture for FEM updating [25].

Transfer learning methods [26], dilated causal convolutional networks [27] or Bayesian neural networks [28], [29], [30], [31] can be used to overcome this problem. For example, Seventekidis et al. [32,33] combined the deep convolutional neural network with the layered multi-damage identification method to train the vibration data generated by the FEM. Zhang et al. [34] improved Bayesian-based FEM updating using transfer learning, with a focus on overcoming modeling uncertainties. Wang et al. [35] proposed a new probabilistic data-driven damage identification method under the sparse Bayesian learning framework. However, although the above methods can make the best use of existing monitoring data, it is often difficult to find enough prior knowledge or similar transfer cases for online SHM.

An approach that combines data-driven and physical model-driven methods can be used to address this problem [36,37]. A typical method called physics-informed neural networks [38] integrates the theoretical framework of FEM updating into the learning process of neural networks. The learning process is guided by the physical model [39] and can achieve accelerated convergence and higher accuracy [40,41]. For example, Yu and Liu [42] proposed a physically-guided generative adversarial network. Xu and Noh [43] introduced a new framework, called physics-based multi-source domain adversarial networks (PhyMDAN), for the diagnosis of structural damage states, which transfers models from other buildings to target buildings.

However, the above methods often involve end-to-end learning. For example, a network is trained to directly fit the mapping relationship between modal errors and damage vectors. The analytical form of this mapping relationship is the fitting of a large sensitivity matrix in the time domain, therefore requiring strong neural network fitting ability. The use of such neural networks often requires expensive hardware equipment. In addition, the knowledge transfer problem of traditional neural networks has always been a topic worth exploring. Reinforcement learning can address problems caused by different states and noise interference to overcome the above-described difficulties. In this approach, agents are trained to continuously interact with the environment and accumulate professional knowledge.

Therefore, the motivation of this paper is to develop an automatic damage detection method based on adaptive theory-assisted reinforcement learning, of which the training object is to learn a dynamic strategy to reduce the current modal errors by adjusting the damage increment. The proposed method seeks to achieve two improvements on the end-to-end problem. First, the states during the learning process and sensitivity-based model updating theory are incorporated into the learning content to enrich the observation features. Second, the end-to-end learning is decomposed into multiple substeps, and the learning content of each substep is further allocated to multiple agents. The innovations and contributions of this paper are as follows: ① To the best of author’s knowledge, very few studies have explored damage detection methods based on FEM updating with a reinforcement learning architecture; ② the network complexity has been reduced, reducing the demand for high-performance computing power while ensuring the convergence of damage detection; and ③ the assisted knowledge of sensitivity-based model updating theory has been introduced in reinforcement learning to improve the robustness and speed of multiagent optimization.

The rest of this paper is organized as follows. In Section 2, a damage detection framework based on the multi-agent twin delayed deep deterministic (MATD3) policy gradient algorithm is established. To address the limitations of classical MATD3, a theory-assisted adaptive MATD3 (TA2-MATD3) method is proposed. In Section 3, a century-old ancient residential house is selected as a case study, and its 12-month SHM data are collected. The TA2-MATD3 method is applied for damage detection using long-term monitoring data. In Section 4, the performance of TA2-MATD3 is validated by comparing its results with those of the other existing algorithms. Finally, Section 5 concludes the paper.

2. Real-time damage detection method based on TA2-MATD3

The error vectors of the calculation and test modal parameters at time T are denoted as WTγT, where γT represents the damage vector at time T. The Newton method and least-squares method can be used to calculate the current damage increment vector ΔγT,p+1 by Eq. (1) [12,13], where p means the pth iteration step in Newton method. More details about the theoretical background can be found in Section S1 in Appendix A.

ΔγT,p+1=FTHγT,pFTγT,p-1FTHγT,pWTγT,p

where WTγT,p is the error vectors in pth iteration step at time T; γT,p is the damage vector in pth iteration step at time T; FTγT,p is the sensitivity matrix of WTγT,p with respect to γT,p in the pth iteration step at time T. FTHγT,p represents the transpose matrix of FTγT,p. FTγT,p can be calculated using the following equation [12,13].

FTγT,p=F1F2FlFNdof,Fl=Yl,1γT,p1Yl,1γT,p2Yl,1γT,pnEYl,2γT,p1Yl,2γT,p2Yl,2γT,pnEYl,nmγT,p1Yl,nmγT,p2Yl,nmγT,pnE

where l means the lth degrees of freedom (DOF). Fl means the sensitivity matrix of the lth DOF. l can be set from 1 to Ndof, where Ndof represents the number of DOFs. nm is the total order of the modal parameters. Yl,nm is the nmth order modal parameter of the lth DOF, other variables (such as Yl,1 and Yl,2) have similar meanings. nE is the total number of components to be detected. γT,pnE means the damage vector of the elements in the nEth component, other variables (such as γT,p1 and γT,p2) have similar meanings. The partial derivative (for example, Yl,nmγT,pnE) is generally evaluated using the finite difference method. This means that at least nE times finite element analyses are required to solve FTγT,p, which incurs a significant computational cost. To simplify the calculation, components with lower sensitivity can be removed from the target components for damage detection. However, the condition number of FTγT,p is still extremely large, which can lead to matrix ill-conditioning problems.

To address these problems, a reinforcement learning framework [44] named MATD3 [45,46] is used. However, the traditional MATD3 faces the following two issues when it is used for damage detection: ① The complexity of learning objectives is high. In the initial stage, it is difficult to effectively explore actions that can achieve damage detection. Moreover, an insufficient number of sensors will lead to information sparsity. ② The efficiency of experience replay is poor. New experience sets are difficult to be selected, resulting in excessively delayed network updating.

Therefore, TA2-MATD3 was proposed to address these two problems by introducing a theory-assisted mechanism and an adaptive experience replay mechanism.

2.1. Algorithm framework: Damage detection based on TA2-MATD3

Let t and t + 1 represent the tth and (t + 1)th iteration step in reinforcement learning, respectively. WT,tiγT,ti means the global modal information error vector observed by the ith agent αi (i can be set from 1 to ka, where ka means the number of the agents) at t and time T. γT,ti is the damage vector of the components in the region detected by agent αi at t and time T. In the following text, subscript ti indicates that the variable about the agent αi at t. Then, ST,ti=[WT,tiγT,ti,γT,ti] represents the states observed by agent αi at t. The algorithm framework of the reinforcement learning-based damage detection method is shown in Fig. 1.

Let aT,ti=ΔγT,ti, where ΔγT,ti represents the damage increment vector of the components, aT,ti is the action made by agent αi at t. πT,tiST,ti,θT,ti, named as the strategy of αi, is defined as the mapping from the observed state ST,ti to the action aT,ti. πT,tiST,ti,θT,ti can be fitted using a deep neural network called a local student actor network (SAN), and θT,ti are the parameters of local SAN to be trained. To improve the convergence [47], a target SAN πT,tiST,(t+1)i,θT,ti is introduced, which inputs the observed state at t + 1, ST,(t+1)i, to predict the next action. θT,ti are the parameters of target SAN to be determined at t. By introducing the local exploration noise at t, ζT,ti, and the target exploration noise at t + 1, ζT,(t+1)i, aT,ti=πT,tiST,ti,θT,ti+ζT,ti and aT,(t+1)i=πT,tiST,(t+1)i,θT,ti+ζT,(t+1)i can be obtained. aT,(t+1)i means the action given by the target SAN of αi at t + 1.

The environment for interacting with agents is defined as follows: After ka agents execute joint action at t (aT,t), γT,t+1, which means the joint damage vector at t + 1 and time T, can be obtained according to γT,t+1=γT,t+aT,t. γT,t is the joint damage vector at t and time T. The FEM is called to perform modal analysis based on γT,t+1. The joint modal error at t + 1 (WT,t+1γT,t+1) can be solved at this time, and thus, the state ST,(t+1)i=[WT,(t+1)iγT,(t+1)i,γT,(t+1)i] at t + 1 can be determined, here γT,(t+1)i is the damage vector of the components in the region detected by agent αi at t + 1 and time T. WT,(t+1)i means the global modal information error vector from agent αi at t + 1 and time T. Let rT,ti=lg(WT,tiγT,ti2/WT,(t+1)iγT,(t+1)i2) indicate the reward feedback from agent αi at t + 1 and time T from the environment after the above process has been executed. The symbol ·2 represents the 2-norm. Thus, after interacting with the environment, an experience set Zt=ST,ti,aT,ti,rT,ti,ST,(t+1)i is obtained by ka agents and is stored in the experience replay pool Θ.

Then, the Q value is defined to measure the long-term benefits of an action. To avoid overfitting Q [45], four critic networks are introduced for each agent: Two of them are local critic networks, which are denoted as Q1T,tiST,ti,aT,t,ψ1T,ti and Q2T,tiST,ti,aT,t,ψ2T,ti. ψ1T,ti and ψ2T,ti are the parameters of the two local critic networks. The other two are target critic networks, which are denoted as Q1T,(t+1)iST,(t+1)i,aT,t+1,ψ1T,ti and Q2T,(t+1)iST,(t+1)i,aT,t+1,ψ2T,ti. The ψ1T,ti and ψ2T,ti are the parameters of the two target critic networks. aT,t+1 means the joint action given by the target SAN at t + 1. These networks predict the long-term benefits Q of action by inputting states and actions. The relationship between the reward rT,ti and Q value can be described by the Bellman equation as shown in Eq. (2).

QT,ti=rT,ti+λmin(Q1T,(t+1)i,Q2T,(t+1)i)

where λ represents the attenuation effect of action aT,ti on the subsequent processes. QT,ti means the Q of αi at t and time T.

θT,ti, θT,ti, ψ1T,ti, ψ2T,ti, ψ1T,ti, and ψ2T,ti in πT,tiST,ti,θT,ti, πT,tiST,(t+1)i,θT,ti, Q1T,tiST,ti,aT,t,ψ1T,ti, Q2T,tiST,ti,aT,t,ψ2T,ti, Q1T,(t+1)iST,(t+1)i,aT,t+1,ψ1T,ti, and Q2T,(t+1)iST,(t+1)i,aT,t+1,ψ2T,ti are finally determined through training.

2.2. Algorithm implementation: Theory-assisted mechanism

The TA2-MATD3 algorithm flow chart can be found in Section S2 in Appendix A and can be described as follows.

Step 1: When t=0 and T=0, randomly initialize parameters θT,ti, ψ1T,ti, and ψ2T,ti. The above corresponding parameters are copied to θT,ti, ψ1T,ti, and ψ2T,ti. Initialize the experience replay pool Θ.

Step 2: The calculation process of using the FEM to solve the damage increment vector (i.e., action) using Eq. (1) is named as the teacher actor generator (TAG). Randomly generate learning tasks with random target damage distributions DTarget and initial damage distributions Dini. At t, each agent outputs action aT,ti=πT,tiST,ti,θT,ti+ζT,ti. Based on this action, γT,t+1=γT,t+aT,t can be obtained, which interacts with the environment defined in the previous section to obtain rT,ti and ST,(t+1)i, and forms the experience set Zt=ST,ti,aT,ti,rT,ti,ST,(t+1)i. Moreover, addition actions aT,ti,G are generated by TAG at t to form a new experience set Zt=ST,ti,aT,ti,rT,ti,ST,t+1i,aT,ti,G stored in the experience replay pool. Repeat Step 2 multiple times to ensure that Θ has enough training samples.

Step 3: Let the number of experience sets Zt=ST,ti,aT,ti,rT,ti,ST,t+1i,aT,ti,G selected randomly from Θ be M. Unlike traditional MATD3, the update objective of the local SAN is no longer to obtain the action that maximizes the Q value. Rather, the new update objective is to minimize the errors of the action outputs by the local SAN and TAG. Eq. (3) can be used to update θT,ti.

LπT,ti=ϑt-t0MjaT,ji,G-aT,jiθT,ti2

where ϑt-t0 indicates that the gradient contribution of the TAG decays with increasing number of iterations. t0 means the iteration step at the beginning of decay. j and (j + 1) represent the jth and (j + 1)th sampled iteration step, respectively. LπT,ti is the objective function of the local SAN. aT,ji,G is the action generated by TAG at j. aT,ji is the action made by agent αi at j.

Using the target SAN, the actions (aT,(j+1)i) given by the target SAN of αi at j + 1 can be solved using aT,(j+1)i=πT,tiST,(j+1)i,θT,ti+ζT,(t+1)i. ST,(j+1)i means the observed state at j + 1. According to Eq. (2), loss functions for updating the local critic networks LQT,ti can be obtained, as shown in Eq. (4).

$\begin{aligned} L_{Q_{T, t i}}= & \frac{1}{M} \sum_{k_{\mathrm{c}}=1}^{2} \sum_{j}\left(y_{T, j i}-Q_{k_{\mathrm{c}} T, t i}\left(\boldsymbol{S}_{T, j i}, \boldsymbol{a}_{T, j}, \boldsymbol{\psi}_{k_{\mathrm{c}} T, t i}\right)\right)^{2} \\ y_{T, j i}= & r_{T, j i}+\lambda \min \left(Q_{1 T,(t+1) i}^{\prime}\left(\boldsymbol{S}_{T,(j+1) i}, \boldsymbol{a}_{T, j+1}^{\prime}, \boldsymbol{\psi}_{1 T, t i}^{\prime}\right)\right. \\ & \left.Q_{2 T,(t+1) i}^{\prime}\left(\boldsymbol{S}_{T,(j+1) i}, \boldsymbol{a}_{T, j+1}^{\prime}, \boldsymbol{\psi}_{2 T, t i}^{\prime}\right)\right) \end{aligned}$

where kc is the serial number of local critic networks, YT,ji means the predicted Q by target critic networks. Eq. (4) enables the obtained local critic networks to accurately evaluate the long-term benefits of actions on the overall process. QkcT,ti means the kcth local critic networks. ST,ji means the observed state at j. aT,j means the joint action at j. ψkcT,ti is the parameter of the kcth local critic network at t. rT,ji means the reward at j. aT,j+1 means the joint action given by the target SAN at j + 1.

Step 4: After several learning tasks for virtual damage distribution DTarget are completed, a set of sensor test data is randomly selected to form a real task. For this real task, both the local SAN and TAG are used for prediction, and the damage detection result at t, WT,tγT,t, is denoted as DT,real. WT,t is joint modal error at t and time T. If WT,tγT,t does not converge, then DT,real will be determined by interpolating real damage distributions around that time. Randomly oscillate near the damage distribution DT,real to generate new target damage distributions (DT,real+ξDTarget). Here, the ξ means a noise distribution. Then, repeat Step 2 and Step 3 according to DT,real+ξDTarget. After performing the above process several times, the output action of the local SAN is close to that of TAG, and the trained parameters of the SAN at this time are denoted as θpretrain.

Step 5: On the basis of θpretrain, Eq. (5) is used to describe the probability function (Pse) for selecting TAG. This indicates that during the training process, the actions output by the SAN gradually become dominant.

Pse=maxPmin,tanhTχ

where Pmin is the minimum probability of TAG selected, and χ is the control coefficient used to determine the decay rate.

In M sampled experience sets, if TAG is not adopted, the experience set is Zt=ST,ti,aT,ti,rT,ti,ST,(t+1)i, and the local network is updated using Eqs. (4), (6), (7). If TAG is adopted, the experience set is Zt=ST,ti,aT,ti,rT,ti,ST,t+1i,aT,ti,G, and the local network is updated using Eqs. (3), (4), (7). Let the number of Zt be nM1, and the number of Zt be nM2 (M = nM1 + nM2).

LπT,ti=-1nM2jQ1T,tiST,ji,πT,tiST,ji,θT,ti,ψ1T,ti

Eq. (6) is the loss function in traditional MATD3 used to update the local SAN. Finally, the soft update coefficient τ is used to update the target SAN and critic network according to Eq. (7).

θT,t+1i=1-τθT,ti+τθT,tiψkcT,t+1i=1-τψkcT,ti+τψkcT,ti,kc=1,2

where θT,t+1i is the parameter of target SAN at t + 1. ψkcT,ti and ψkcT,t+1i mean the parameter of the kcth target critic network at t and that at t + 1, respectively.

Step 6: Repeat Steps 4 and 5 until all of the input real data are trained. During the learning process, the following evaluation module is introduced to assess the quality of the output actions from the current local SAN: Several trained real data tasks and virtual damage tasks are randomly selected, and the current local SAN is used to conduct damage detection for each task. The evaluation indicators include the rewards mean (RM), the error between the identified damage and the preset damage in sampled virtual damage task (EDV), the ratio of modal error converging in sampled real (RMER) and virtual (RMEV) damage tasks, and the ratio of damage error convergence in sampled virtual damage tasks (RDEV). If the decrease in the maximum RMEA (average of the RMER and RMEV) is less than 15%, θpretrain is updated by θT,ti in the local SAN. By contrast, Eq. (8) is used to update θT,ti. This mechanism was named as the backtracking mechanism.

θT,ti=1-τθT,ti+τθpretrain

The advantages of the loss functions used in this paper can be further discussed. First, Eqs. (4), (6) are classic loss functions in reinforcement learning, which aim to increase the Q value of the actions from the local SAN while ensuring the accuracy of the long-term benefits predicted by the critic network. Under these two premises, combined with the theory-assisted mechanism proposed above, agents can achieve self-exploration and self-training of damage detection tasks. Eq. (3) is the core term of the theory-assisted mechanism proposed in this paper. This loss aims to address the “exploration dimension dilemma” faced by typical reinforcement learning algorithms. In fact, with limited exploration steps, if the learning objectives are complex, it is difficult for the agents to explore good strategies. This makes the trained agents become ‘‘lazy’’, that is the agents are only concerned with avoiding the introduction of larger errors rather than on minimizing the current errors as much as possible. The essence of Eq. (3) is to train the actions of the agent to approximate those from the TAG, so that the agents will obtain better search initial values in subsequent training, which can improve exploration speed and convergence. Eq. (3) can also be used to ensure that there will be enough good action samples in the experience pool. In addition, Eq. (3), combined with the sampling mode of Step 5 and the backtracking mechanism proposed in Step 6, can gradually transfer the knowledge of model updating theory to the SAN.

2.3. Adaptive experience replay mechanism

In Steps 3 and 5, random selection operator is used to select experience sets from Θ. In traditional MATD3, equal probability sampling is used, which may have poor efficiency in terms of experience replay, as discussed above. Therefore, in this paper, the following experience replay mechanism was developed: introduce a probability array P with dimensions consistent with the number of samples in Θ. The current sample size in Θ was set to Nrb. The selected batch size Nbs is a fixed value. The whole sampling process is then divided into two stages according to the following conditions: After the sampling times reach Npick according to the batch size Nbs, the probability of each sample in Θ being chosen at least once is greater than 95%. To meet the above requirements, an approximate relationship between Nbs, Npick, and Nrb is derived as given in Eq. (9). After extensive numerical experiments, Npick is recommended to be set to 100, and Nrb=2Nbs. In the first stage, equal selection probability can be used for sampling to enable the intelligent agent to explore the entire space as much as possible.

q=0Nrb+Npick-Nbs-1qCNrb+NpickqCNrb+Npick-qNbsCNrb+NpickNbsNpick>95%

where C is the combinatorial number. q is the summation index.

In the second stage, the last element in P is set to one, which means that the latest experience set will be selected in the next sampling. The selection probability for the remaining elements can be calculated using Eq. (10).

r¯=1kai=1kari,r^=r¯-minr¯maxr¯-minr¯-κ1,P=r^tanhNrbNbs-1+10-5κ2tr^tanhNrbNbs-1+10-5κ2

where ri is the whole reward array for αi; r¯ is the average reward of the multiagent; r^ is the normalized average reward. tr^tanhNrbNbs-1+10-5κ2 represents summation in the dimension of t. The coefficient κ1(0<κ1<1) is set to control the proportion of selected experience sets for which a negative reward was selected. Positive integer κ2 controls the rate of sampling and tends toward equal probability. By following Eq. (10), agents can focus more on samples with larger absolute reward values in the early stages of training and learn samples with various rewards equally in the later stages of training.

3. Results

3.1. Research case

Based on the proposed method architecture, a damage detection process can be designed as Fig. 2. In the project preparation process, humans participate in monitoring system construction, FEM construction, and initial intelligent agents training. After the initial intelligent agents are obtained, automatic damage detection can be performed for real online projects, along with self-training of the intelligent agents.

For the purpose of validating the performance of the proposed algorithm, a one-year SHM program was implemented on the 133-year-old ancient Chinese residential building named Zhuzi Shecang, which is located in Wufu Town, Wuyishan City, Fujian Province. Zhuzi Shecang is 17.6 m wide, with a total depth of 23 m, and its greatest height is approximately 6.5 m. A Gaussian distribution was used to simulate the initial material randomness of timber, with an average density of 527.2 g·cm−3 and a coefficient of variation of 14.7%. The average elastic modulus was 9564 MPa, and the coefficient of variation was 13.8%. By using the pymapdl 0.63.1 Python package (ANSYS Incorporated, USA), the FEM of Zhuzi Shecang was established with ANSYS 19.0 (ANSYS Incorporated, USA), as shown in Fig. 3. BEAM189 was used to simulate the mechanical behavior of timber, involving a total of 2091 elements and 3090 nodes. There are a total of 697 components, and based on the initial damage distribution detected onsite and the damage sensitivity, 180 main components are selected for damage detection. COMBIN39 spring elements are used to simulate the soil loosening and connection behavior between timber components. According to an onsite soil survey report, the vertical spring stiffness is set to 1.5 × 104 kN·m−3. For the connection between timber components, the rotational stiffness in the plane is taken as 148.68 kN·m·rad−1 [48], and the remaining DOFs are fixed.

3.2. Monitoring results

The optimal sensor placement of this ancient residential building was determined in a previous study [13], and the sensor layout scheme is shown in Fig. 3. The test involves 15 measurement points (including one reference point, and each point has X and Y channels). The testing period lasts for one year, with six tests per day for 15 min each time, and a sampling frequency of 100 Hz. Preprocessing operations were conducted for the collected initial data, including removal of direct current, use of least-squares to remove trend terms, Gaussian denoising, and the application of the Savitzky–Golay method. The covariance-driven random subspace method based on steady-state graphs is used to identify the modal parameters of the structures. According to the results of previous studies [49], [50], [51], [52], [53], the modal parameter results may be affected by the temperature. Therefore, the preliminary identified modal parameters were corrected based on the annual temperature values, as shown in Fig. 4.

3.3. Algorithm hyperparameters and dataset partitioning

The algorithm was developed and compiled in Python 3.8 using the Jupyter Notebook (NumFOCUS, USA). The residual neural network shown in Fig. 5 is constructed for SAN. For the critic networks, similar networks are used, which remove tanh and subsequent operators from the output layer. The SAN consist of five layers of residual blocks, while critic networks consist of eight layers of residual blocks. Each residual block is a single hidden layer structure. The number of neurons in the first and last residual blocks is 256, and the number of neurons in the remaining hidden layers is 128. With the exception of the output layer, leaky rectified linear unit (ReLU) is used as the activation function for all layers. Each residual block uses the layer normalization regularization technique [54]. The batch size is set to 32. The maximum number of explorations in each episode is 30. The number of agents is three. The exploration noise range is 3 × 10−4–1 × 10−3, and the sine attenuation function is used. The Adam optimizer is used for training, and the L2 regularization (ridge regularization) coefficient is 1 × 10−3. The discount rate is 0.95, the soft update coefficient is 0.9, the learning rate of the SAN is 3 × 10−4, and the learning rate of the critic network is 1 × 10−4. Moreover, the warm-up mechanism [55] and the exponential attenuation mechanism [56], [57], [58] of the learning rate were introduced. Details about the hyperparameters can be found in Section S3 in Appendix A. The algorithm is implemented according to Steps 1 and 6 described in Section 2.2.

The first training stage, as described in Steps 2, includes 500 virtual damage distributions DTarget. The generation mode of DTarget is to define severe damage (mean value of 0.35, standard deviation of 3%), moderate damage (mean value of 0.2, standard deviation of 1.67%), minor damage (mean value of 0.12, standard deviation of 0.3%), and no damage components using a Gaussian distribution among all components [59]. Dini adopts a uniform distribution within the interval [0, 10%]. It should be noted that this damage distribution mode was determined through onsite testing and based on the results reported in the Ref. [59]. When this method is adopted for other cases, nondestructive testing equipment, such as stress wave instruments and ultrasonic instruments, can be used during the initial construction of the SHM system to determine the initial damage distribution. On the other hand, these virtual damage tasks are introduced to train initial agents to obtain better network parameter values, as discussed in Section 2.2. After the SHM system is built and real test data are obtained, the methods for generating real and virtual tasks described below are used to continue training the agents, gradually reducing the impact of the limitations due to these virtual damage distributions.

In Section 3.2, a total of 2157 modal parameters were identified, which means that the maximum number of DT,real is 2157. The total DT,real set was divided into three subsets with the sample sizes of 400, 1400, and 357. The subset with the sample size of 400 was used for training the local SAN to approximate TAG. For the subset with the sample size of 1400, the probability of using TAG decreased. For the above two subsets, based on Step 4, five groups of DT,real+ξDTarget were generated for each DT,real. The subset with the sample size of 357 was not trained by the network and was used for algorithm performance testing in Section 4. Here, the mean of ξDTarget is zero, and the standard deviation is 5%. Dini took the damage distribution identified in the previous time. Each task is trained five times with a slightly varying initial value. For reinforcement learning, the final input samples are the experience sets from tasks with different DT,real values at various time points. The sample information for the local SAN is shown in Table 1.

3.4. Training loss and evaluation indicators

The loss curve of the training process is shown in Fig. 6. It is observed from Fig. 6(a) that the loss curve oscillates significantly in the early stage and gradually converges in the later stage. Two effects may give rise to the large amplitude of oscillation in the early stage. First, there are many training tasks in the early stage with different damage distributions, and the quality of the actions output by the local SAN cannot approximate that of TAG. Second, the backtracking mechanism in Step 6 using θpre-train will cause significant loss oscillations. Fig. 6(a) shows that in the later stage of training, the performance of the local SAN gradually approaches that of TAG. Finally, due to the attenuation coefficient in Eq. (3), the training objective gradually focuses on Eq. (6).

In Fig. 6(b), in the early stage of training, the task objective is mainly to learn the action of the TAG. Therefore, at this time, due to the low performance of local SAN, even negative Q values may occur. After a certain period of training, the intelligent agent gradually obtains a positive Q and tends toward stability.

As shown in Fig. 6(c), it is difficult for the agent to determine the long-term benefits of a certain action in the early stages of training. As training progresses, this problem is gradually ameliorated. After the SAN participates in training, due to either the poor performance of the SAN or different decisions made by the SAN and TAG for the same states, the critic network may provide error Q values. This can also be verified by the phenomena observed during the same period presented in Fig. 6(b). In the later stage of training, the loss of the critic network gradually converges, indicating that the critic network can accurately judge the long-term benefits of a certain action.

As mentioned in Step 6, a periodic evaluation module was used. When there are enough training tasks in the experience pool, an evaluation task is performed after each four learning tasks are completed. In one evaluation task, 20 real tasks were randomly selected, and 20 virtual tasks were generated.

A total of 2718 evaluations were conducted, and the results are shown in Fig. 7. Fig. 7(a) shows the RM for three agents and 40 tasks. A higher RM corresponds to better overall performance on these 40 tasks. It can be observed that RM gradually increased in the first 600 evaluations. According to the analysis of the loss curve presented in Fig. 6, in this stage, the local SAN tried to approximate TAG. As the performance of SAN gradually approaches that of TAG, the RM gradually increases, but the mean value of RM tends to be constant. This is because TAG has certain bottlenecks due to its matrix ill-condition issues. Between the approximately 600th and 1000th evaluations, the training process begins to be dominated by SAN. According to the analysis in Fig. 6, the action quality of SAN has decreased, resulting in a decrease in the RM. Between the approximately 1000th and 1800th evaluations, the performance of the SAN gradually improves. After approximately 1800 evaluations, the RM was higher than the RM during the TAG-dominated period, indicating that the performance of SAN may be better than that of TAG at this time.

Fig. 7(b) shows the standardized results of EDV, with smaller values corresponding to better performance of the agents. Therefore, the overall trend is similar to the trend in Fig. 7(a) after symmetry with respect to the X axis. For example, significant errors occur roughly in the stage of SAN initial participating in training, which is consistent with the conclusions discussed above.

As shown in Fig. 7(c), the calculation results of RMER are within the range of [0, 100%]. The overall trend is roughly similar to that of Fig. 7(a) because both indicators describe the quality of the agent’s actions. A high RM value indicates that the agent has taken good actions in these tasks, leading to easier convergence of modal errors. In some cases, there may be inconsistencies between the RM and RMER trends. This is because RMER focuses on counting the number of tasks with successful convergence, whereas RM focuses more on the quality of each action itself. Therefore, it is possible that in a certain evaluation, several tasks were completed exceptionally well, but other tasks did not converge with the modal error very close to the convergence threshold. In this case, RM will be high and RMER will be low. In the later stage of training, RMER reached approximately 90%.

Similar conclusions appear in the RMEV and RDEV results shown in Fig. 7(d). It is observed that RMER is generally greater than RMEV. This is because when real tasks are predicted, the damage distribution of the previous real task is used as the initial value, which is more conducive to achieving optimization. However, the target damage distribution of virtual tasks may be more complex, and their initial values may not be easier to optimize. Fig. 7(d) also shows that RMEV is generally higher than RDEV. This is because the increases in damage of the components with a too low or too high damage sensitivity may be misestimated in attempt to reduce local modal errors.

4. Discussions

In this section, the performance of the following four models are compared: ① damage detection based on Newton’s method, hereafter referred to as the Newton method; ② damage detection based on Newton’s method and refined using the L-curve method, hereafter referred to as the L-curve method; ③ the proposed TA2-MATD3 algorithm; and ④ the model-agnostic meta-learning (MAML) algorithm, which is a well-known algorithm used for few-shot regression [60,61]. The MAML network in this paper adopts the multilayer perceptron (MLP) based on residual blocks, which is the same as those in TA2-MATD3. A detailed description of MAML can be found in a previous report [60]. To ensure algorithm comparability, all identical hyperparameters are kept consistent.

Two datasets are used for the comparative testing. Dataset A contains 1000 virtual tasks and 500 real tasks are randomly selected from the training set. Dataset B contains 500 newly generated virtual tasks and 357 real tasks held in reserve when dividing the training set.

4.1. Calculation errors

Six evaluation metrics in regression problems are used to measure the errors of the four algorithms, namely the mean squared error (MSE), mean squared log error (MSLE), median absolute error (MDAE), mean absolute error (MAE), explained variance score (EVS), and coefficient of determination (R2); these metrics are defined in Eq. (11). The calculation results of 1500 virtual tasks are shown in Table 2.

$\begin{array}{l} \mathrm{MSE}=\frac{1}{n_{\mathrm{sz}}} \sum_{i_{\mathrm{s}}=1}^{n_{\mathrm{sz}}}\left(D_{i_{\mathrm{s}}}-\hat{D}_{i_{\mathrm{s}}}\right)^{2} \\ \mathrm{MSLE}=\frac{1}{n_{\mathrm{sz}}} \sum_{i_{\mathrm{s}}=1}^{n_{\mathrm{sz}}}\left(\ln \left(D_{i_{\mathrm{s}}}+1\right)-\ln \left(\hat{D}_{i_{\mathrm{s}}}+1\right)\right)^{2} \\ \mathrm{MDAE}=\operatorname{median}(|\boldsymbol{D}-\hat{\boldsymbol{D}}|) \\ \mathrm{MAE}=\frac{1}{n_{\mathrm{sz}}} \sum_{i_{\mathrm{s}}=1}^{n_{\mathrm{sz}}}\left|D_{i_{\mathrm{s}}}-\hat{D}_{i_{\mathrm{s}}}\right| \\ \mathrm{EVS}=1-\frac{\operatorname{var}(\boldsymbol{D}-\hat{\boldsymbol{D}})}{\operatorname{var}(\boldsymbol{D})} \\ R^{2}=1-\frac{\sum_{i_{\mathrm{s}}=1}^{n_{\mathrm{sz}}}\left(D_{i_{\mathrm{s}}}-\hat{D}_{i_{\mathrm{s}}}\right)^{2}}{\sum_{i_{\mathrm{s}}=1}^{n_{\mathrm{sz}}}\left(D_{i_{\mathrm{s}}}-\bar{D}\right)^{2}} \\ \bar{D}=\frac{1}{n_{\mathrm{sz}}} \sum_{i_{\mathrm{s}}=1}^{n_{\mathrm{sz}}}\left(D_{i_{\mathrm{s}}}\right) \end{array}$

where nsz is the sample size, D is the preset damage, and D^ is the predicted damage. Dis is the isth element of D. D^is is the isth element of D^. is is the sample index. var(·) means variance function. D¯ is the damage mean.

Table 2 shows that TA2-MATD3 achieves the best performance for all six metrics. The second-best performance is achieved by MAML. The worst comprehensive performance is achieved by the Newton method.

The results for the convergence ratio and average error are shown in Fig. 8. For the virtual (real) tasks in Dataset A, the convergence ratio of TA2-MATD3 is 97.1% (97.0%), that of the L-curve method is 79.7% (82.2%), that of the Newton method is 72.5% (77.4%), and that of MAML is 91.1% (91.4%). The above results show that in the training set, the TA2-MATD3 algorithm can achieve damage detection because the network has been fully trained on these samples. The small number of failure cases may be due to the following reasons: ① The damage location has low sensitivity;② these cases may have been sampled less frequently during the training process and have not received sufficient training; and ③ these tasks might have come close to the error threshold, yet they have failed to reach it. In addition, Figs. 8 (a)–(h) also demonstrate that TA2-MATD3 can fully utilize the learned experience from completed training cases and achieves similar convergence ratios for both real and virtual tasks. The tasks associated with two non-deep-learning algorithms are relatively independent so that experience cannot be shared between different tasks. MAML introduces meta-learning mechanisms, resulting in good performance in different training tasks. In addition, in Fig. 7, the first 600 evaluations can be considered to be dominated by the Newton method, and the evaluations after the 2000th evaluation can be considered to be dominated by TA2-MATD3. It can be observed that the convergence ratio in Fig. 8 is always higher than that in Fig. 7. This may be due to the following reasons: First, the number of samples used during the evaluation module is limited, so that the selected tasks may not be as representative as those used during testing. Second, during the training and evaluation process, a local SAN is used to approximate the Newton method, which may itself have certain errors. For TA2-MATD3, some samples that do not converge successfully in the early stage may achieve convergence in subsequent training.

The above conclusion can also be seen in Figs. 8(i)–(p). For Dataset B, the convergence ratio of the virtual task using TA2-MATD3 reached 86.2%, while on real tasks, it reached 91.9%, indicating that TA2-MATD3 can achieve good accuracy even for tasks that have never been trained. The results of the two non-deep-learning algorithms are independent of whether the tasks have been trained. Overall, the convergence ratio of real tasks is approximately 3%–11% higher than that of virtual tasks because of their better initial values. The performance of MAML on the test set is inferior to that on the training set. This is mainly due to the insufficient knowledge transfer ability of the MAML network. For average error, TA2-MATD3 shows better performance than the other three algorithms. In summary, the TA2-MATD3 algorithm shows the best performance in terms of algorithm accuracy. Due to the fact that both MAML and TA2-MATD3 are deep learning algorithms, and in this case, TA2-MATD3’s calculation accuracy is much higher than that of MAML, the following sections will mainly compare the other performances of TA2-MATD3 and two non-deep-learning algorithms.

4.2. Computational efficiency

The computing device when testing is a 16-core central processing unit (CPU) with the model Intel® Core™ i7-14650HX (USA). The graphics processing unit (GPU) model is an NVIDIA GeForce RTX 4060 Laptop GPU (USA). In terms of calculation time, the average time for conducting damage detection is approximately 61 min for the Newton method, 79 min for the L-curve method, and 45 s for TA2-MATD3. For the successful convergence cases, the average time of the Newton method is approximately 35 min, that of the L-curve method is approximately 48 min, and that of TA2-MATD3 is 18 s. Thus, TA2-MATD3 is much more efficient than the two non-deep-learning comparison algorithms.

4.3. Damage detection on virtual tasks

To further illustrate the damage detection capability of the three algorithms, a typical virtual task was selected, as shown in Fig. 9. The modal errors of both the TA2-MATD3 and the L-curve algorithms converged on this task, whereas the Newton method did not reach the convergence threshold. Figs. 9(a)–(d) show the damage detection results of components with preset different degrees of damage, which are selected for display. For severely damaged components, due to their high damage fraction, the detection results have a significant impact on modal errors. Therefore, all algorithms can roughly locate these components. However, the Newton method did not successfully locate damage on components 86, 608, and 669, and the damage values obtained from the successfully identified damage locations were generally lower than the preset values. The same phenomenon is also observed on the moderately damaged and slightly damaged components. For lightly damaged components, the damage values obtained by the Newton method are mostly less than 5%, which means that the identified damage is approximately equivalent to the identified damage noise. This indicates that the success rate of locating damage in lightly damaged components is low. Although both the TA2-MATD3 method and the L-curve method achieve modal error convergence, the damage values identified by TA2-MATD3 are closer to the preset values. In addition, Fig. 9(d) shows that on the components without preset damage, the damage noise identified by TA2-MATD3 is also relatively small. This can also be proved by the results presented in Fig. 8.

Another special virtual task, namely the use of the algorithms to detect damage for the case with no damage on the structure and 0 initial value, also should be discussed. Since there is no damage and 0 initial value, the error is 0. According to Eq. (1), if WTγT,p is 0, the increase in the output damage should obviously be 0. Therefore, regardless of whether the Newton method or the L-curve method is used, the damage increment is 0 in this case. For the TA2-MATD3 method, when the network input is 0, the output is controlled by the bias term, regularization term, and activation function in the neural network, and is independent of the learned weight term. The damage increment range of the network output is 0.3%–2.3%. This indicates that when the overall damage of the structure is small, TA2-MATD3 may have certain errors. However, due to the long service time of architectural heritage buildings, significant damage within the structure often cannot be ignored. It should also be pointed out that this case is different from that of Fig. 9(d). The case of Fig. 9(d) has significant global damage, resulting in a certain modal error. Therefore, even in positions without damage, the Newton method and the L-curve method may misjudge the damage.

4.4. Detection of onsite apparent damage

To further demonstrate the reliability of the algorithm, the latest real monitoring results were selected for performance testing of the three algorithms. At the time of data collection in the latest real monitoring, onsite apparent damage inspections were conducted, and a total of 49 main apparent damages, including decay, cracking, and deformation, were qualitatively discovered, as shown in Fig. 10.

The main apparent damage can be defined as follows. For the main apparent damages identified by onsite investigations, according to GB/T 50165–2020: Technical Standard for Maintenance and Strengthening of Historic Timber Building [62], the criteria are shown in Table 3. In terms of calculation, considering comprehensive factors such as algorithm errors discussed in Section 4.3, the components with damage coefficients greater than 10% were considered to be the components with the main damage.

The TA2-MATD3 algorithm marked all main apparent damages, while there were two instances where damages greater than 10% were marked for non-apparent damages. It should be noted that such damage marks cannot be considered as misjudgments of damage, because they may be caused by nonapparent damage, such as internal voids. Therefore, the following analysis focuses on the 49 main apparent damage components. For the other two algorithms, there are cases of missing the damage components, such as components 100 and 444 being omitted by the L-curve method and components 72, 74, 142, and 309 being omitted by the Newton method. Based on the above analysis, TA2-MATD3 has good damage detection ability for discovering apparent damages on site.

5. Conclusions

In this paper, a real-time damage detection method based on adaptive theory-assisted multiagent reinforcement learning was proposed. Taking a century-old residential building as an example, an SHM system was constructed to obtain one year’s modal information. Then, the proposed TA2-MATD3 algorithm was compared with other algorithms, and the superiority of the proposed algorithm was demonstrated based on indicators such as computational efficiency, convergence ratio, mean modal error, and damage detection capability. The developed method can provide online real-time damage assessment results for the preventive conservation of architectural heritages. The following conclusions can be drawn from this work.

(1) The proposed TA2-MATD3 algorithm can be divided into four stages. In the initial stage, the TAG is used to generate virtual damage distributions to fill the experience replay pool. In the second stage, the parameters in the SAN can be updated by minimizing the error between the output action of the SAN and that of the TAG. The updating method of the critic model follows that in the typical MATD3 algorithm. In the third stage, the probability of being selected for the TAG is gradually reduced, increasing the participation level of the SAN. In this stage, the SAN is trained with both the goal of minimizing action errors and maximizing long-term benefits. In the fourth stage, the TAG no longer participates in the action output, and the weight of the minimizing action error in the cost function decreases. However, at the same time, the best network parameters in the training history are cached. If the corresponding evaluation indicators of the SAN decrease significantly, the network will be backtracked to the cached state.

(2) In terms of algorithm training, the loss curve and the evaluation indicators indicate that the overall training results conform to the above four stages. Before the SAN participates in training, a certain degree of performance improvement in the evaluation indicator is observed. This finding indicates that at this stage, the performance of the SAN gradually approaches that of the TAG. During the initial period of SAN participation in training, the performance briefly declined as observed from various indices, which was due to the misjudgment of the SAN on many newly input tasks. With the assistance of the backtracking mechanism and theory-assisted mechanism, various indicators gradually improve in the later stage of training, and can eventually surpass those in the period when the TAG is dominant. In the end, the ratio of the modal error convergence and the ratio of damage error convergence remained stable at over 80%.

(3) The proposed TA2-MATD3 algorithm was compared with the L-curve method, Newton method, and MAML in terms of performance metrics, evaluated on both virtual and real tasks across training and testing datasets. In terms of the convergence ratio, TA2-MATD3 achieves a stable convergence ratio of approximately 97% on the training set, which is much higher than the 79.7%–82.2% of the L-curve method, 72.5%–77.4% of the Newton method, and 91.1%–91.4% of the MAML. For the test set, the convergence ratio of TA2-MATD3 is stable at 86.2%–91.9%, whereas that of the L-curve method is stable at 73.4%–79.3%, that of the Newton method is stable at 65.4%–77.0%, and that of the MAML method is stable at 72.0%–80.1%. In terms of computational efficiency, for the cases of successful convergence, the computational efficiency of the proposed TA2-MATD3 method is 117 times that of the Newton method and about 160 times that of the L-curve method. In addition, real-time damage detection research was also conducted by using the latest monitoring data and apparent damages measured on site as an example. The results revealed that all main apparent damages were identified by TA2-MATD3, while the L-curve method missed two damages, and the Newton method did not achieve algorithm convergence. Therefore, it is concluded that TA2-MATD3 has learned the monitoring data well, and can achieve better real-time damage detection than the other algorithms.

The main limitations of this paper can be discussed as follows. First, compared with large-scale civil engineering structures, the scale of the research case in this paper is relatively small. For large civil engineering structures, the dimensions of the states and actions that each agent needs to be responsible for significantly increase. Moreover, the environment in which this case is located is simple, with low fluctuations in external factors and small changes in the states, making it easier to learn. By contrast, structures in service often exhibit complex noise conditions, environmental change modes, and strong uncertainties. These complex external conditions pose significant challenges in generating accurate experience pools.

However, the proposed TA2-MATD3 can also be practical for application to these two issues by the following approaches. First, the number of agents in TA2-MATD3 can be increased to reduce the dimensionality that each agent is required to observe. To address the increase in the computing power demand caused by this, distributed and parallel training methods can be adopted. In addition, the MLP used in this paper can be easily replaced with self-attention-based transformer or convolutional neural network architectures to cope with larger-scale learning tasks. To address the second problem which is related to the experience pool calculation, a multiscale model updating method that considers temperature and humidity changes may be used. By combining nonlinear damage mechanics that consider environmental effects with nonlinear modal analysis methods that consider parameter uncertainty, a more accurate experience pool can be obtained for intelligent agents to learn, thereby overcoming the challenges posed by complex external conditions.

CRediT authorship contribution statement

Chengwen Zhang: Writing – original draft, Validation, Software, Methodology, Conceptualization. Qing Chun: Writing – review & editing, Supervision, Resources, Project administration, Funding acquisition. Yijie Lin: Visualization, Investigation, Data curation.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The work in this paper was supported by National Key Research and Development Program of China (2023YFF0906100), National Natural Science Foundation of China (52408008), and Key Research and Development Program of Jiangsu Province (BE2022833).

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.eng.2025.03.026.

References

[1]

Bao Y, Chen Z, Wei S, Xu Y, Tang Z, Li H. The state of the art of data science and engineering in structural health monitoring. Engineering 2019; 5(2):234-242.

[2]

Zhou Y, Meng S, Lou Y, Kong Q. Physics-informed deep learning-based real-time structural response prediction method. Engineering 2023; 35:140-157.

[3]

Wang J, You H, Qi X, Yang N. BIM-based structural health monitoring and early warning for heritage timber structures. Autom Construct 2022; 144:104618.

[4]

Yu Y, Hoshyar AN, Samali B, Zhang G, Rashidi M, Mohammadi M. Corrosion and coating defect assessment of coal handling and preparation plants (CHPP) using an ensemble of deep convolutional neural networks and decision-level data fusion. Neural Comput Appl 2023; 35(25):18697-18718.

[5]

Jiang T, Li L, Samali B, Yu Y, Huang K, Yan W, et al. Lightweight object detection network for multi‐damage recognition of concrete bridges in complex environments. Comput Aided Civ Infrastruct Eng 2024; 39(23):3646-3665.

[6]

Wang L, Yi S, Yu Y, Gao C, Samali B. Automated ultrasonic-based diagnosis of concrete compressive damage amidst temperature variations utilizing deep learning. Mech Syst Signal Process 2024; 221:111719.

[7]

Li D, Zhang J. Finite element model updating through derivative-free optimization algorithm. Mech Syst Signal Process 2023; 185:109726.

[8]

Jiang Y, Li Y, Wang S, Xu M. A novel multistage approach for structural model updating based on sensitivity ranking. Smart Struct Syst 2020; 25:657-668.

[9]

Zheng B, Yu K, Liu S, Zhao R. Interval model updating using universal grey mathematics and Gaussian process regression model. Mech Syst Signal Process 2020; 141:106455.

[10]

Smith MJ, Hutton SG. Frequency modification using newton method and inverse iteration eigenvector updating. AIAA J 1992; 30(7):1886-1891.

[11]

Zárate BA, Caicedo JM. Finite element model updating: multiple alternatives. Eng Struct 2008; 30(12):3724-3730.

[12]

Li XY, Law SS. Adaptive Tikhonov regularization for damage detection based on nonlinear model updating. Mech Syst Signal Process 2010; 24(6):1646-1664.

[13]

Zhang CW, Chun Q, Leng JW, Lin YJ, Qian YC, Cao G, et al. Optimal placement method of multi-objective and multi-type sensors for courtyard-style timber historical buildings based on meta-genetic algorithm. Struct Health Monit 2024; 23(3):1468-1497.

[14]

Rezaiee-Pajand M, Entezami A, Sarmadi H. A sensitivity-based finite element model updating based on unconstrained optimization problem and regularized solution methods. Struct Contr Health Monit 2020; 27(5):e2481.

[15]

Cao Z, Fei Q, Jiang D, Kapania RK, Wu S, Jin H. A sensitivity-based nonlinear finite element model updating method for nonlinear engineering structures. Appl Math Model 2021; 100:632-655.

[16]

Cao Z, Wei H, Liang D, Jia Z, Yao J, Jiang D. A non-intrusive dynamic sensitivity-based substructure model updating method for nonlinear systems. Int J Mech Sci 2023; 248:108218.

[17]

Esfandiari A, Sanayei M. More insight on function-weighted frequency response function sensitivity method for analytical model updating. J Sound Vibrat 2021; 509:116143.

[18]

Zhu H, Li J, Tian W, Weng S, Peng Y, Zhang Z, et al. An enhanced substructure-based response sensitivity method for finite element model updating of large-scale structures. Mech Syst Signal Process 2021; 154:107359.

[19]

Kahya V, Ad Gıyaman, To Vğan. Damage detection in anisotropic-laminated composite beams based on incomplete modal data and teaching–learning-based optimization. Struct Multidiscipl Optim 2022; 65(11):332.

[20]

Mousavi Z, Varahram S, Ettefagh MM, Sadeghi MH, Razavi SN. Deep neural networks-based damage detection using vibration signals of finite element model and real intact state: an evaluation via a lab-scale offshore jacket structure. Struct Health Monit 2021; 20(1):379-405.

[21]

Zhao Y, Peng Z. Frequency response function-based finite element model updating using extreme learning machine model. Shock Vib 2020; 2020:1-10.

[22]

Bai Y, Peng Z, Wang Z. A finite element model updating method based on the trust region and adaptive surrogate model. J Sound Vibrat 2023; 555:117701.

[23]

Sarmadi H, Yuen KV. Early damage detection by an innovative unsupervised learning method based on kernel null space and peak-over-threshold. Comput Aided Civ Infrastruct Eng 2021; 36(9):1150-1167.

[24]

Levin RI, Lieven NAJ. Dynamic finite element model updating using neural networks. J Sound Vibrat 1998; 210(5):593-607.

[25]

Yang N, Zhang Y. Finite element model updating of Tibetan structure based on artificial neural network. J Vibrat Shock 2013; 32:125-129.

[26]

Deng Z, Zhang X, Zhao Y. Transfer learning based method for frequency response model updating with insufficient data. Sensors 2020; 20(19):5615.

[27]

Zhang Y, Li Z, Hao R, Lin W, Li L, Su D. High-fidelity time-series data synthesis based on finite element simulation and data space mapping. Mech Syst Signal Process 2023; 200:110630.

[28]

Hua H, Sol H, Wilde DWP. On a statistical optimization method used in finite element model updating. J Sound Vibrat 2000; 231(4):1071-1078.

[29]

Bartilson DT, Jang J, Smyth AW. Sensitivity-based singular value decomposition parametrization and optimal regularization in finite element model updating. Struct Contr Health Monit 2020; 27(6):e2539.

[30]

Vega MA, Todd MD. A variational Bayesian neural network for structural health monitoring and cost-informed decision-making in miter gates. Struct Health Monit 2022; 21(1):4-18.

[31]

Wang Z, Liang D, Ding S, Zhang W, He H. A feature map of frequency response functions based model updating method using the Bayesian convolutional neural network. Mech Syst Signal Process 2023; 204:110818.

[32]

Seventekidis P, Giagopoulos D. A combined finite element and hierarchical deep learning approach for structural health monitoring: test on a pin-joint composite truss structure. Mech Syst Signal Process 2021; 157:107735.

[33]

Seventekidis P, Giagopoulos D, Arailopoulos A, Markogiannaki O. Structural health monitoring using deep learning with optimal finite element model generated data. Mech Syst Signal Process 2020; 145:106972.

[34]

Zhang Z, Sun C, Guo B. Transfer-learning guided Bayesian model updating for damage identification considering modeling uncertainty. Mech Syst Signal Process 2022; 166:108426.

[35]

Wang QA, Dai Y, Ma ZG, Ni YQ, Tang JQ, Xu XQ, et al. Towards probabilistic data-driven damage detection in SHM using sparse Bayesian learning scheme. Struct Contr Health Monit 2022; 29(11):e3070.

[36]

Xiang Y, Pan B, Luo L. A new model updating strategy with physics-based and data-driven models. Struct Multidiscipl Optim 2021; 64(1):163-176.

[37]

Sedehi O, Kosikova AM, Papadimitriou C, Katafygiotis LS. On the integration of physics-based machine learning with hierarchical Bayesian modeling techniques. Mech Syst Signal Process 2024; 208:111021.

[38]

Di D Lorenzo, Champaney V, Marzin JY, Farhat C, Chinesta F. Physics informed and data-based augmented learning in structural health diagnosis. Comput Methods Appl Mech Eng 2023; 414:116186.

[39]

Yamaguchi T, Mizutani T. A physics-informed neural network for the nonlinear damage identification in a reinforced concrete bridge pier using seismic responses. Struct Contr Health Monit 2024; 2024(1):5532909.

[40]

Zhang Z, Sun C. Structural damage identification via physics-guided machine learning: a methodology integrating pattern recognition with finite element model updating. Struct Health Monit 2021; 20(4):1675-1688.

[41]

Zhou W, Xu YF. Damage identification for plate structures using physics-informed neural networks. Mech Syst Signal Process 2024; 209:111111.

[42]

Yu Y, Liu Y. Physics-guided generative adversarial network for probabilistic structural system identification. Expert Syst Appl 2024; 239:122339.

[43]

Xu S, Noh HY. PhyMDAN: physics-informed knowledge transfer between buildings for seismic damage diagnosis through adversarial learning. Mech Syst Signal Process 2021; 151:107374.

[44]

Sutton RS, McAllester D, Singh S, Mansour Y. Policy gradient methods for reinforcement learning with function approximation. Adv Neur In 2000; 12:1057-1063.

[45]

Fujimoto S, van H Hoof, Meger D. Addressing function approximation error in actor-critic methods. PMLR 2018; 80:1587-1596.

[46]

Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I. Multi-agent actor-critic for mixed cooperative–competitive environments.In: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017 Dec 4–9; Long Beach, C A, US A. Red Hook: Curran Associates Inc.; 2017. p. 6382–93.

[47]

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nature 2015; 518(7540):529-533.

[48]

Lin YJ, Chun Q, Zhang CW, Han YD, Fu H. Research on seismic performance of traditional Chinese hall-style timber buildings in the Song and Yuan dynasties (960–1368 AD): a case study of the main hall of Baoguo Temple. J Wood Sci 2022; 68:1.

[49]

Sohn H, Dzwonczyk M, Straser EG, Kiremidjian AS, Law KH, Meng T. An experimental study of temperature effect on modal parameters of the Alamosa Canyon Bridge. Earthquake Eng Struct Dynam 1999; 28(8):879-897.

[50]

Xia Y, Hao H, Zanardo G, Deeks A. Long term vibration monitoring of an RC slab: temperature and humidity effect. Eng Struct 2006; 28(3):441-452.

[51]

Moser P, Moaveni B. Environmental effects on the identified natural frequencies of the Dowling Hall Footbridge. Mech Syst Signal Process 2011; 25(7):2336-2357.

[52]

Han QH, Ma Q, Liu M, Xu J. Damage diagnosis of space grid structure based on natural frequency clustering analysis under varying temperature effects. J East China Jiaotong Univ 2021; 38:9-17.

[53]

Anastasopoulos D, Reynders EPB. Modal strain monitoring of the old Nieuwebrugstraat Bridge: local damage versus temperature effects. Eng Struct 2023; 296:116854.

[54]

Lei Ba J, Kiros JR, Hinton GE. Layer normalization.2016. arXiv: 1607.06450.

[55]

Gotmare A, Shirish Keskar N, Xiong C, Socher R. A closer look at deep learning heuristics: learning rate restarts, warmup and distillation.2018. arXiv: 1810.13243.

[56]

Co-Reyes JD, Miao Y, Peng D, Real E, Levine S, Le QV, et al. Evolving reinforcement learning algorithms.2021. arXiv: 2101.03958.

[57]

Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al. Playing Atari with deep reinforcement learning.2013. arXiv: 1312.5602.

[58]

Bhatt A, Palenicek D, Belousov B, Argus M, Amiranashvili A, Brox T, et al. CrossQ: batch normalization in deep reinforcement learning for greater sample efficiency and simplicity.2019. arXiv: 1902.05605.

[59]

Zhang C, Chun Q, Lin Y, Han Y, Jia X. Quantitative assessment method of structural safety for complex timber structures with decay diseases. J Build Eng 2021; 44:103355.

[60]

Zhang C, Chun Q, Sun A, Lin Y, Wang H. Improved meta-learning neural network for the prediction of the historical reinforced concrete bond–slip model using few test specimens. Int J Concr Struct Mater 2022; 16(1):41.

[61]

Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. PMLR 2017; 70:1126-1135.

[62]

Ministry of Housing and Urban–Rural Development of the People's Republic of China, State Administration for Market Regulation. GB/T 50165–2020: Technical standard for maintenance and strengthening of historic timber building.Chinese standard. Beijing: China Architecture & Building Press; 2020. Chinese.

RIGHTS & PERMISSIONS

THE AUTHOR

PDF (5969KB)

Supplementary files

AnAutomaticDamageDetectionMethodBasedonAdaptiveTheory-AssistedReinforcementLearning

3527

Accesses

0

Citation

Detail

Sections
Recommended

/