Nonlinear analyses possess tremendous significance throughout the entire lifespans of civil structures. In recent years, the interest in leveraging deep learning (DL) to address the efficiency limitations of the traditional structural analysis methods has increased. However, full-range nonlinear analyses of different structures remain underresearched because of a lack of appropriate data representations and the failure to consider both internal structural information and external load conditions. A heterogeneous graph (HetG) representation scheme that can digitalize arbitrary structural systems with high fidelity is proposed in this study. Furthermore, a composite feature learning framework is developed to enable efficient full-range nonlinear analyses. This framework comprises two main components: ① a heterogeneous graph neural network (GNN)-based module that encodes static features into embeddings with full structural semantics and ② a sequence-to-sequence (Seq2Seq) module that predicts history-dependent responses using structural embeddings and external stimuli in an end-to-end manner. A computational model named structural analysis based on a graph neural network-nonlinear (StructGNN-N) is implemented based on the proposed methodology and is validated through numerical experiments involving real-world concrete structures. The results show that StructGNN-N successfully reproduces the full-range nonlinear responses of all nodes in the entire structure and exhibits excellent generalizability across structures with diverse topological designs and member configurations. Notably, the developed model achieves a computational efficiency level that is 1000 times greater than that of the traditional elastoplastic history analysis approach using the finite-element (FE) method. A parametric analysis and ablation studies demonstrate the effectiveness of the StructGNN-N architecture. Due to its superior accuracy and computational efficiency, the proposed method holds great potential for use in engineering applications, especially in the context of digital twins. This approach provides an inspiring path for simulating diverse engineering structures with accurate and comprehensive mechanical information in real time.
Nonlinear structural analyses form a fundamental aspect of structural engineering throughout the entire lifespan of a structure, from elastoplastic time history analysis during the design phase to reliability estimation of damaged structures during maintenance. This technique plays a pivotal role in evaluating the safety and functionality of buildings and infrastructures while also linking physical structures to their digitalized representations [1,2]. The conventional methods, which are primarily based on the finite-element (FE) framework, face challenges such as high modeling complexity, low computational efficiency, and convergence issues [3]. These limitations make the traditional methods increasingly inadequate for modern engineering applications [4], especially within the digital twin context [5], which demands real-time updates and simulations.
In recent years, the structural analysis technique leveraging new-generation artificial intelligence (AI), represented by deep learning (DL) [6], [7], [8], has gained significant attention due to its extraordinary performance [4], greatly reducing the required analysis time while maintaining superior accuracy. Furthermore, AI-based structural analyses exhibit impressive adaptability, enabling a single well-trained model to handle diverse scenarios without repetitive model construction processes [4,9]. Notably, with the evolution of the AI for Science (AI4S) paradigm, DL models can efficiently uncover the intrinsic physical properties within complex mechanical problems, even when the underlying mechanisms are unclear [10,11]. These strengths have motivated numerous scholars to integrate DL into nonlinear structural analysis methods.
The current research can be categorized into two levels: analyses of individual structural components and analyses of structural systems. At the structural component level, Tao et al. [12] employed deep neural networks (DNNs) for the constitutive modeling of composites, achieving considerable accuracy and demonstrating the effectiveness of neural networks in mechanical analysis scenarios. Rahman et al. [13] compared various machine learning models for estimating the shear strength of steel-fiber reinforced concrete (RC) beams and suggested suitable formations for structural engineers. Abambres and Lantsoght [14] built a neural network to predict the shear capacities of one-way slabs under concentrated loads, demonstrating its powerful ability to simulate complex mechanical behaviors. Feng et al. [15] used a data-driven approach to train models for predicting the plastic hinge lengths of RC columns, achieving significantly higher prediction accuracy than that of the state-of-the-art empirical relationship-based methods. To conduct structural system analyses, Huang et al. [16] developed a DL-based model for predicting the seismic responses of a subway station, demonstrating that DL methods can be used in basic nonlinear analyses. Ning et al. [17] combined long short-term memory (LSTM), convolutional neural networks (CNNs), and WaveNet to analyze the seismic responses of a steel building frame and bridge structures; this was a successful attempt to integrate the characteristics of DL methods with nonlinear analyses of structural systems. However, the applicability of this technique is relatively restricted to simplified structural systems because of the limitations of the underlying data scheme. Zhou et al. [18] employed a physics-informed DL model for predicting the real-time responses of a high-rise building and achieved high accuracy by incorporating physical information into the model. Fei et al. [19] proposed a data-driven prediction method for evaluating shear wall structures, which were simplified to multiple degrees of freedom (MDOF) systems.
Despite these substantial research advancements, a significant gap remains between academic studies and the practical application of the state-of-the-art AI models for analyzing real engineering structures. The primary limitation is the inability of such models to compute full-range responses for an entire structure. The current methodologies predominantly focus on surrogate modeling, typically predicting specific mechanical indices at limited points of interest, such as the displacement history of the top floor of a building [4]. In contrast, the classic FE method can obtain the holistic mechanical fields of structures. This limitation stems from the lack of appropriate data schemes that are capable of digitalizing arbitrary structural systems with high fidelity. Notably, system-level analysis studies lag behind component-level analysis studies because of their intricate nature, which is characterized by diverse spatial topologies and complex component configurations. Conventional linear data structures (e.g., vectors, sequences, and grids) fall short in terms of capturing the comprehensive features of engineering structures. Consequently, the existing models either address a specific structural system without structural information [18] or oversimplify regular structures into finite features, thereby failing to provide global mechanistic response predictions for the entire target structure. In previous research [20], the authors proposed a physics-informed graph neural network (GNN) for elastic structural analyses, which introduced graph data—a type of non-Euclidean data structure—to parameterize frame structures. This innovation could be extended to represent arbitrary structural systems. However, the proposed model was unable to process nonlinear stimuli and was restricted to linear elasticity. A full-range nonlinear analysis of a structural system necessitates a methodology that considers both the inherent features of the underlying structure and the complete time history of the associated external loading cases.
To address the aforementioned gap, a heterogeneous graph (HetG) learning-based computational model, structural analysis based on a graph neural network-nonlinear (StructGNN-N), is proposed in this study to conduct full-range nonlinear analyses of structural systems. The structure of this study is as follows. Section 2 presents a universal graph data representation scheme for structural systems and elucidates the corresponding computational framework. Section 3 presents the methodology. Section 4 details the implementation of the proposed methodology, including a GNN-transformer model and a novel strategy for alleviating the data requirements of training. Section 5 validates the model, while Section 6 discusses the results of an ablation study and explores potential applications involving digital twins. Finally, Section 7 concludes our research.
2. Related work
Heterogeneous graph neural networks (HGNNs). Research on HGNNs has gained significant traction over the past decade because of their effectiveness in managing HetG data [21,22]. Unlike traditional methods that encode information derived from uniform nodes and edge types, HetG learning techniques have been designed to preserve the relational diversity present in real-world data, encompassing various entities and connections. For example, Hu et al. [23] proposed heterogeneous graph transformer (HGT) and employed an attention mechanism to facilitate message passing among different types of nodes. Mo et al. [24] introduced a novel framework, heterogeneous edge-enhanced graph attention network (HEAT), that effectively models the heterogeneous dynamics and interactions of multiagent data using directed edge-feature graphs and adaptive map selection. To mitigate the performance degradation that occurs in deeper HGNNs, metapaths have been incorporated into models to connect distant nodes, leading to improved models such as HGNN-attribute enhancement and structure-aware attention (AESA) [25] and metapath context convolution-based heterogeneous graph (MECCH) [26]. The versatility of HGNNs has led to their widespread application in areas such as social networks and recommendation systems. Cai et al. [27] utilized a HetG encoder network, heterogeneous graph contrastive learning (HGCL), for learning the representations of users and microvideos, along with a type-crossing objective function that integrates embeddings from diverse node types. Another HGCL framework presented by Chen et al. [28] enhances the adaptive contrastive learning process conducted between user-item interactions and auxiliary heterogeneous relational views, thereby more effectively capturing diverse user preferences and item characteristics. Ma et al. [29] designed heterogeneous generative pre-trained transformer (HetGPT), which introduces a multiview neighborhood aggregation mechanism to capture the complex neighborhood structures that are inherent in HetGs and improves applications involving complex patterns such as online classification and social recommendation. The research and applications related to HGNNs inspire the foundational framework presented in this paper, which seeks to integrate an approach for managing different types of nodes into the representation learning processes of structural systems, thereby enabling the heterogeneous components within these systems to be more accurately modeled.
GNN-based mechanical analysis. In recent years, intuitive methods involving graph data and GNNs, which represent physics problems with node and edge configurations, have been applied in the field of solid mechanics, including metamaterials and composite materials [30]. Black and Najafi [31] employed a multifidelity GNN as a supervised framework for solving two-dimensional (2D) elastoplastic problems concerning metamaterials. Li et al. [32] proposed a machine learning approach based on a GNN to simulate the dynamic responses of copper bars and demonstrated the general applicability of their GNN in structural response prediction tasks. Xue et al. [33] designed a graph-based network to solve dynamic problems related to mechanical metamaterials; their approach can incorporate defects and spatial inhomogeneities. Yacouti and Shakiba [34] combined a CNN and a GNN to predict the mechanical fields contained in composite microstructures and discussed the advantages of each network component. Hendriks et al. [35] designed a similarity-equivariant GNN for the homogenization of metamaterials, which innovatively introduces physical symmetries to predict mechanical properties. Further studies involving GNN applications have addressed various problems, including mesh-based simulations of continuums [36], dominant failure mode searches for structures [37], and dynamic response predictions for continuous deformable bodies [38], offering new perspectives regarding structural mechanics. Inspired by the related studies and the similarities between structural systems and graph data, the previous research published by the authors in Ref. [20] was among the first works to propose the use of homogenous GNNs for representing and computationally analyzing structural systems, and effectively addressed elastic analyses of frame structures. Zhang et al. [39,40] represented architectural and structural layouts using graph data and developed a computational and optimization framework for RC systems by integrating GNNs and generative adversarial networks (GANs). Following the same scheme, Zhao et al. [41] discussed the possibility of extracting topological information from the layouts of shear wall systems, and Fei et al. [42] improved the computational efficiency achieved in the design and construction stages with knowledge-enhanced GNNs. In addition, Zhang et al. [43] used a specifically tailored GNN with adaptive capabilities to conduct seismic safety assessments across various scenarios involving coupled train-bridge systems. These studies have preliminarily demonstrated the potential of GNNs for simulating the elastic behaviors of structural systems via message passing mechanisms; however, few studies have focused on elastoplastic analyses, which are pervasive in engineering applications. Furthermore, the existing studies have typically concentrated on a single structural system or a specific category, simplifying the internal entities to a single type and overlooking the mechanical differences among various component types. As a result, the employed models are predominantly homogenous GNNs. To produce more accurate representations of general structural systems composed of multiple entity types, it is necessary to incorporate an HetG for data representation and computation purposes.
3. Methodology
3.1. Graph representations of arbitrary structural systems
Before delving into DL modeling, it is crucial to establish high-fidelity representation schemes for diverse structural systems on the data side. Our previous research [20] laid the groundwork by transforming bar systems, such as frame structures with beams and columns, into graph data structures [44,45] (Fig. 1(a)), thus effectively mapping joints to nodes and structural bars to edges. However, many engineering structures encompass plate and shell components, such as shear wall buildings and nuclear containments. To accommodate this broader spectrum of scenarios, we propose a novel approach using HetGs [22] representations for arbitrary structural systems.
In the new scheme, different structural components and joints (i.e., the intersections of components) correspond to different types of nodes. The features of the structural components and joints (e.g., their section geometries, material properties, and joint coordinates), which are denoted as static features [46], are organized within the node features. Since the descriptions of different structural components vary, a node in the HetG can carry heterogeneous content. Furthermore, the connectivity among distinct graph nodes inherently represents the topological relationships contained within the structural system.
Mathematically, considering a structural system comprising M0 structural joints and k types of structural components (each with corresponding counts of M1, M2, …, Mk). The node feature ($\boldsymbol{\nu}_{i}$) of a node i that belongs to a certain type of node is as follows:
where di denotes the feature dimensionality of node i. The topology of the structural system is represented by an adjacent matrix (A) with a size of N × N (N is the total number of nodes):
where each element aij denotes the connectivity between nodes i and j. If aij = 1, then the nodes i and j are connected. Notably, A is symmetric.
On this basis, we can digitalize any structure system using an HetG:
$G=\{V, E, \boldsymbol{A}, \psi\}$
where G is the graph data, V is the set of nodes, and E is the set of edges; each node belongs to a set of nodes with the same category ψ. Conversely, given reasonable HetG data, we can uniquely determine the corresponding structural system. In this manner, a bijective mapping relationship between an arbitrary structural system and the HetG is established.
A simplified frame-shear wall structure is taken as an example (Fig. 1(b)). Its HetG representation encompasses 30 nodes categorized into six types, with four columns, eight beams, two braces, two shear walls, two slabs, and 12 joints (including four column feet). Each node type is visually differentiated using distinct colors. Moreover, assuming that the node features include joint coordinates, employing the breadth-first search algorithm facilitates a straightforward process of reconstructing the initial structure.
Building upon the new scheme, we propose the following DL modeling pipeline (Fig. 2) to address the problem of full-range nonlinear structural analyses. Utilizing the HetG representation, an HGNN framework is employed to encode the static features of each component within the structural system, serving as structural embeddings. The input stimulus sequences, denoted as dynamic features [46,47], are subsequently integrated with the structural embeddings and channeled through a sequence processing network to generate full-range responses.
3.2. Structural embedding with an HGNN
The raw features of a structural system are recorded in the HetG representation but are not suitable for direct computation because the features of distinct structural components are isolated and topological information is not integrated. To learn the collaborative mechanism used by the structural components within the system, the HGNN is utilized for processing various types of graph nodes.
GNNs belong to a category of DL models that are designed to capture the nonlinear relationships embedded in graph data [44]. Following the methodology introduced in Ref. [20], the learning process of the HGNN can adopt the message passing mechanism:
where $\mathcal{T}_{\psi\left(\boldsymbol{v}_{i}\right)}$ is an operator that is used to map different types of nodes to a common latent space; $\mathcal{M}^{(l)}$, $\ mathcal {G}^{(l)}$, and $\ mathcal {U}^{(l)}$ are the learnable functions denoting the node message interconnection, aggregation, and update operations in the lth layer, respectively; $\boldsymbol{v}_{j}$ is the node feature of the node j; $\boldsymbol{m}_{i j}^{(l)}$ is the message from nodes j to i in the lth layer; $\boldsymbol{h}_{i}^{(l)}$ and $\boldsymbol{A}_{i}^{(l)}$ are the state information and aggregated information of the node i in the lth layer, respectively; $\boldsymbol{h}_{i}^{(l-1)}$ and $\boldsymbol{h}_{j}^{(l-1)}$ represent the state information of nodes i and j in the (l − 1)th layer, respectively. Through the iterative message passing scheme, the HGNN can fuse topological and structural component configuration information into the node representations, yielding structural embeddings with rich mechanistic semantics.
3.3. Structural response generation
The primary objective of the full-range nonlinear analysis process is to predict response sequences under various loading protocols. In principle, a time series-based DL model can address tasks involving sequence data [48,49]. However, classic computational mechanics reveal a pronounced history-dependent effect in structural nonlinear analysis scenarios, necessitating intricate evolution rules to formulate hysteretic behaviors. For example, steel plate shear walls, which are some of the most common laterally resistant components in engineering structures, exhibit significant cyclic hardening and breathing effects [47]. Consequently, distinct loading paths can yield various energy dissipation performances [50].
To accommodate the demand for history-dependent sequential prediction, the sequence-to-sequence (Seq2Seq) [51] architecture integrated with an attention mechanism [52] (Fig. S1 in Appendix A) is recommended. The Seq2Seq architecture consists of an encoder module and a decoder module. The encoder converts the loading path into dense tensors, which are referred to as contexts. The decoder autoregressively generates future responses using the historical responses and the derived contexts. The integration of the attention mechanism can significantly augment the ability of the constructed model to extract global historical information, thus mitigating the memory decay issue that is inherent in conventional recurrent neural networks. In addition, to fuse the information gleaned from the HGNN, the structural embeddings are concatenated with excitations at every time step. The mathematical formulation of the dynamic feature processor is as follows:
where s, r, and u represent the stimulus sequence, the structural response sequence, and the processed sequence in the middle layer, respectively; τ denotes the current time step during the prediction process (T steps in total); $\boldsymbol{s}_{i \leq \tau}$ denotes the stimulus information from start to current τ; $\boldsymbol{r}_{\tau}$ represents the structural responses at time step τ and $\boldsymbol{r}_{i \leq \tau-1}$ denotes the structural responses from start to the last time step $\tau-1$; $\boldsymbol{c}_{\tau}$ represents the contexts acquired from the encoder at time step τ; $\boldsymbol{u}_{\tau}^{\text {self }}$ and $\boldsymbol{u}_{\tau}^{\text {cross }}$ represent the processed information at time step τ from self-/cross-attention layer, respectively. Furthermore, $ \mathcal{A} $ signifies the attention operator that utilizes a query tensor and a key tensor; $ \mathcal{F}_{\mathrm{enc}}$, $ \mathcal{F}_{\text {dec }}^{\text {self }} $, and $ \mathcal{F}_{\mathrm{dec}}^{\text {cross }} $ signify feedforward neural networks (FNN) of the encoder/decoder in self-/cross-attention mechanisms, respectively; the value functions $ \mathcal{V}_{\mathrm{enc}} $ and $ \mathcal{V}_{\mathrm{dec}} $ are the multilayer perceptrons (MLPs) that transform inputs into a form with proper dimensions in the encoder and decoder, respectively; $ \mathcal{F}_{\text {out }} $ is the output layer, and (·||·) denotes the tensor concatenation operation.
The fusion of the HGNN embedding model with the response generation module forms an end-to-end data-driven framework for predicting the full-range responses of structural systems. This framework operates on HetG representations, effectively digitizing diverse structural systems and comprehensively accommodating both intrinsic structural configurations and random external excitations.
4. Implementation
Following the proposed pipeline, we implement a computational model called StructGNN-N. An illustration of the model is shown in Fig. 3. More details are explained below.
In a structural system, many joints share identical local topological structures but exhibit different mechanical behaviors during the analysis process. To capture the topological information contained in a graph and differentiate between various node features [20], a graph isomorphism network (GIN) is established [53]. This network uses MLPs to approximate the functions $ \mathcal{M} $, $ \mathcal{G} $, and $ \mathcal{U} $ based on the theory of the graph isomorphism problem, ensuring the injectiveness of feature mapping during the message passing process. Furthermore, we introduce the joint representation learning paradigm [54] to combine heterogeneous node types as well as features to form a common latent space, as shown in Fig. 3(a). Specifically, additional MLP layers with sum pooling are employed to accommodate the various dimensions of nodes, enabling the integration of heterogeneous information derived from neighborhoods. Therefore, we construct a modified GIN model named a heterogeneous GIN (HGIN):
where $ \epsilon^{(l)} $ is a learnable parameter for adjusting the weight of the node feature vi (if $ \epsilon^{(l)} $ = 0, the weight of the information derived from vi is the same as that of its neighbor node); $ \operatorname{MLP}_{\boldsymbol{v}_{i}}^{(l)} $ and $ \operatorname{MLP}_{\boldsymbol{v}_{j}}^{(l)} $ are the MLP layers for the corresponding node i and node j in the lth layer. For different types of nodes, different MLP layers are trained to aggregate information and enable the unification of the various dimensions of the initial node embeddings in the hidden layers.
4.2. Mechformer: A transformer for mechanical computations
For the dynamic feature module, we introduce the mechanical transformer (Mechformer) model (Fig. 3(b)) proposed by the authors in Ref. [46] (Fig. S2 in Appendix A). The Mechformer was derived from the transformer architecture [52] and belongs to the Seq2Seq model family.
To extract the essential information from a sequence, a multi-head attention mechanism can discover the correlation between any two positions within the sequence. In the vanilla transformer [32], the attention results obtained for the query (Q), key (K), and value (V) tensors are calculated as follows:
However, the temporal and spatial costs $ O\left(L^{2} d\right) $ are proportional to the square of the sequence length L and the model dimension d, which imposes a high memory demand, especially with the <s, r> sequences in nonlinear analyses involving large L values. To achieve improved efficiency, the fast attention via FAVOR+ algorithm [55] is employed in the Mechformer model to replace the standard attention module (Fig. S3 in Appendix A). By approximating the nonlinear softmax activation function in Eq. (14) with a linear combination calculated by the kernel function, the complexity level is reduced to O(Lmd):
where $ \phi: \mathbb{R}^{d} \rightarrow \mathbb{R}_{+}^{m} $ is a random feature mapping kernel function; m denotes the hidden dimensionality and $ m \ll L. \mathbf{Q}^{\prime} $ and $ \boldsymbol{K}^{\prime} $ are the mapped query and key in the hidden dimension. Furthermore, considering the causality observed in structural analyses, the future information contained in the sequences is masked using the prefix summation algorithm, which corresponds to computing the lower-triangular part of the attention matrix.
To integrate the structural information hi derived from the HGIN with the jth input stimulus sequence sj of the Mechformer, the structural embeddings of node i are first replicated to match the sequence length L of sj, obtaining replicated embeddings $ \boldsymbol{H}_{i} $:
where $ \boldsymbol{H}_{i}^{j} $ denotes the concatenated information of node i and the stimulus sequence $ \boldsymbol{s}^{j} ; \boldsymbol{s}_{1}^{j}, \boldsymbol{s}_{2}^{j}, \ldots, \boldsymbol{s}_{L}^{j} $ represent the information of sj corresponding to the step $ 1,2, \ldots, L. $
4.3. Masked response training strategy
We employ a data-driven inference paradigm, training the model based on the nonlinear response sequences of structural systems subjected to different input stimulus sequences. Specifically, for a given structural system with N nodes, the structural responses are denoted as $ \left\{\boldsymbol{r}_{1}^{j}, \ldots, \boldsymbol{r}_{N}^{j}\right\} $, where $ \boldsymbol{r}_{i}^{j} $ is the structural response sequence corresponding to node i produced for the input stimuli sj. The dataset for this structural system can be represented as shown below:
where K denotes the total number of stimulus sequences (e.g., various seismic waves). In practice, experimental data for structural systems are extremely scarce, whereas generating data using an FE analysis is a computationally intensive procedure. Consequently, fully exploiting the information derived from existing data is highly important. To augment the data utilization process, we propose a masked response training strategy.
The data of a structural system is split into N groups, each in the form of <structural node embedding, node stimulus sequence, node response sequence>:
Typically, the number of nodes N is significantly larger than the number of structural system samples. Inspired by this observation, we break the structural-level data down into node-level data to facilitate the comprehensive extraction of the implicit mechanisms within the data, thereby significantly enlarging the trainable data size and alleviating the data scarcity burden. Additionally, by partitioning partial nodes (and their connectivity) as the training set, our strategy enables the prediction of the whole system from local inputs. This provides a new solution for smart virtual sensing [56].
Programmatically, we design a masking technique to unify the computation process. The mask $ \boldsymbol{M}^{\mathrm{TR}} $ is a matrix with dimensions of n × N that performs multiplication for random sampling tasks:
where i1, i2, …, in are the selected indexes, which correspond to the selected nodes i1, i2, …, in; $ \boldsymbol{r}_{i_{n}}^{j} $ is the vector of $ \boldsymbol{r}^{j} $ (the jth structural response sequence) corresponding to index in, $ m_{i j} $ is an element in $ \boldsymbol{M}^{\mathrm{TR}} $(i = 1, …, N; j = 1, …, N), and each row of MTR is a unit vector. By modifying the mask matrix, we can determine the size and distribution of the training set. The mask matrix MTR is also applied to the structural embedding $ \boldsymbol{H}^{j} $:
where $ \boldsymbol{H}_{i}^{j} $ represents the concatenated information of node i (i = 1, …, N) and the stimulus sequence sj, and $ \boldsymbol{H}_{i_{1}}^{j}, \boldsymbol{H}_{i_{2}}^{j}, \ldots, \boldsymbol{H}_{i_{n}}^{j} $ are the concatenated information of the selected nodes i1, i2, …, in, respectively.
5. Validation
5.1. Data preparation
5.1.1. Basic information of the structural system
For demonstration purposes, a real-world 9-floor office building is chosen. As shown in Fig. 4, this building is an L-shaped RC structure with an outline of 52.8 m (length) × 43.2 m (width). The height of each floor ranges from 4.8 to 5.4 m.
The first three natural frequencies of the structure are 0.97, 1.55, and 2.19. The main structural components include RC columns, slabs, and beams. The sections of these components vary in their sizes and reinforcement ratios. The corresponding graph data representation contains 4303 nodes of four types and 6948 edges. Some of the node features of the joints and the beam components are listed in Table 1, Table 2.
5.1.2. Data preparation for stimulus and response sequences
We aim to train a model that can conduct a nonlinear analysis on a structural system subjected to seismic waves. The stimulus sequences are the ground acceleration records of different earthquakes, which act upon the structure in the horizontal direction. Many ground motion records are sampled from the Pacific Earthquake Engineering Research (PEER) database [57], and a total of 70 records that match the structural frequency characteristics of the target building are selected, covering a relatively wide range of seismic parameters. A portion of the ground motion records is shown in Table 3.
A numerical model of the structure is established using the fiber beam model [58,59] to obtain <s, r> sequence pairs. The time interval is set as 0.02 s, ensuring that the sequence lengths range from 1000 to 2500. The simulation of each ground motion record in the FE analysis takes approximately 1 to 3 h. The time series of the displacement of each node is extracted to form a dataset. Fifty pairs of <s, r> sequences are randomly selected for training, 10 pairs are used for validation, and the rest are employed for testing. The computing length of the model is set to 2048. Therefore, sequences with shorter lengths are padded with the end values at their tails, while the longer sequences are truncated. The data generation process runs on 10 computers in parallel and requires approximately 64 h of computations.
5.2. Model configuration
A three-layer HGIN is employed for StructGNN-N. The hidden dimensions of the HGIN and the Mechformer module are 32. The dimension of the kernel function employed in FAVOR+ is 64. The encoder and decoder modules in the Mechformer have three blocks. A two-head attention mechanism is employed, with a hidden size of 128 in the linear layer. Layer normalization [60] is added. In total, the model has 102 093 parameters.
For optimization purposes, we use the adaptive moment estimation (Adam) [61] optimizer with an initial learning rate of 0.005. The model is trained for 1000 epochs, and the learning rate decays 50% after every 20% of the training epochs. The mean squared error (MSE) loss is employed to measure the training results:
$ \text { Loss }=\text { mean }\left(\left\|\mathcal{R}_{\text {pred }}-\mathcal{R}_{\text {label }}\right\|_{2}\right) $
where $ \mathcal{R}_{\text {pred }} $ and $ \mathcal{R}_{\text {label }} $ represent the results predicted by the model and the labeled results used for reference, respectively, and ||·||2 is the L2 norm. For validation and testing, the loss is transformed into a relative accuracy term (Accu):
Where $ \eta $ is a small value that is used to avoid division by zero.
The initial graph data are directly fed into the HGIN module. After concatenating them with the training stimulus sequences, the combined features of 256 nodes are randomly chosen as a batch using a random mask matrix MTR, thus modeling the shuffling process of minibatch training. The model is trained on the PyTorch [62,63] platform using an American NVIDIA RTX 4080 graphics processing unit (GPU) with 16 GB of memory. The training and validation processes employed for the demonstrated structural system require approximately 3 h.
5.3. Model performance
The trained model achieves a relative accuracy level of 97.5% on the test dataset. The predictions produced under different seismic waves are compared with the results of the FE analysis, as illustrated in Fig. 5, where $ \mathcal{R} $ refers to the result displacement, NRD refers to the normalized displacement $ \mathcal{R}_{\text {NRD }} $ relative to the ground displacement $ \mathcal{R}_{\mathrm{GD}} $:
StructGNN-N is able to predict the full-range displacement sequences with considerable accuracy and successfully distinguishes between different positions rather than producing isolated results. The model exhibits excellent generalizability across various seismic waves, enabling the identification of potential weak points using comprehensive response information rather than relying on several static mechanical indices. Additionally, the relative lateral displacement and interstory drift values at the peak points are extracted and compared with the results acquired from the FE models. As shown in Fig. 6, the results computed by StructGNN-N closely align with those of the FE analysis. The Mechformer module captures the cyclic loading characteristics in the seismic analysis task, accurately delineating different loading and unloading phases. While minor discrepancies appear during the initial and final stages of the seismic time history analysis, within the range of the smaller structural response amplitudes, at critical peak points that potentially lead to structural failures, such as the maximum interstory drift, the model predicts responses with greater accuracy. This is attributed to the attention mechanism, which, through extensive data training, focuses more on the historical influences of the peak values.
Overall, the numerical experiment validates the effectiveness of the StructGNN-N model in terms of reproducing history-dependent hysteresis loops as well as its excellent ability to capture structural characteristics. Notably, the trained model takes several seconds to predict responses for the whole structure, significantly improving upon the temporal efficiency of the traditional FE methods.
5.4. Structural generalization
For generalization purposes, the StructGNN-N model is trained on a dataset including 942 real-world engineering projects to predict seismic responses across different structures subjected to various ground accelerations. Owing to computational resource and time limitations, 10 ground motion records are selected from the PEER dataset for each structural system. The lateral displacements and interstory drifts at the point with the maximum structural response are extracted and compared with the FE analysis results, achieving an average accuracy of 95.4%. The representative test cases are detailed below.
(1) Case 1: a structure with a regular plan layout. Case 1 involves an eight-story RC structure with a regular and symmetrical plan layout, as depicted in Fig. 7. A full-range nonlinear analysis is conducted under seismic loading conditions, yielding accurate lateral displacement and interstory drift predictions.
(2) Case 2: a structure with an irregular plan layout. Case 2 features a four-story RC structure characterized by an asymmetrical and irregular layout, thus diverging from the training dataset. The computational results shown in Fig. 8 accurately capture the asymmetry and demonstrate the maximum lateral displacement and story drift of the structure.
6. Discussions
6.1. Parameter analysis
The StructGNN-N model can infer information about an entire structural system on the basis of data acquired from selected points within the structure. A critical issue is determining how many selected points are adequate for predicting the responses of the entire structure. Answering this question also inspires us to strategically place measurement devices at the optimal positions to obtain the mechanical responses at these locations and employ StructGNN-N to assess its overall structural performance. In practical engineering applications, economic considerations and technical limitations often restrict the numbers and locations of sensors. StructGNN-N can be trained on a subset of data derived from these limited points and then applied to predict responses at other locations where direct measurements are not available, aligning with the concept of virtual sensing [56].
To address this issue, we investigate the effects of the selection and distribution of the training dataset consisting of n nodes on the resulting model performance. Various percentages of training data (unmasking ratio: n/N) and different selection approaches, which correspond to different settings for the measurement instruments, are considered. ① RD: The training samples are randomly distributed in the graph, corresponding to randomly placed measurement instruments within the structural system. ② UD: The training samples are uniformly distributed in the graph, corresponding to placing the measurement instruments within the structural system by following a specific pattern. ③ CL: The samples are concentrated in the lower part of the structural system. ④ CU: The samples are concentrated in the upper part of the structural system. In accordance with the corresponding selection approaches, different masks are applied to the node embeddings, as shown in Fig. 9(a).
The experiment is carried out on models with different configurations: S32 is the StructGNN-N model described in the previous section, and S16 and S64 are lighter and heavier versions, respectively, with different hidden dimensions. The total numbers of parameters required for these models are 26 493, 102 093, and 400 749. To ensure a fair comparison, all the models are fully trained for 1000 epochs and tested on the same dataset while using Eq. (27) as the metric. The results are presented in Fig. 9. The RD and UD strategies yield similar results with the same model configuration, which aligns with our intuition when the structural system is sufficiently large. The key difference between the two strategies is that the RD strategy results in greater variability, whereas the UD strategy generally results in an upward trend in performance as the unmasking ratio increases. On the other hand, the CL and CU strategies yield less accurate results in general. The performance of these methods significantly improves as the unmasking ratio increases, but increasing the proportion of selected positions to over 50% is often impractical in real-world applications. Notably, even with only a 10%-20% unmasking ratio (which is often the case in engineering applications), the RD and UD strategies can achieve approximately 90% accuracy, which is sufficient for obtaining a reasonable assessment of a structural system.
6.2. Ablation study
StructGNN-N is composed of two main parts: an HGIN for encoding structural information and the Mechformer module for sequence processing. In this section, we carry out an ablation study on these two components.
To encode the structural information, we adopt a DNN, a 2D CNN [64], and a vanilla gated recurrent unit (GRU) [65] for comparison purposes. The node features contained in the graph data of the examined structural systems are modified to adapt to the input (Fig. S4 in Appendix A). For sequence processing, we compare the Mechformer with the GRU and the vanilla transformer [52]. The unmasking ratio is 20%, and the same RD strategy is applied. All the models are trained for 1000 epochs and tested on the same dataset. The results are shown in Table 4.
Table 4 shows that StructGNN-N (HGIN + Mechformer) yields the optimal results, and the Mechformer achieves the highest accuracy in most combinations with different encoding modules. The results reveal that the encoding process of structural information is crucial for predicting the nonlinear responses of structural systems. The DNN encodes the features of each component individually and completely neglects their relationships, which limits its ability to generalize to the test dataset. The CNN encodes the features of each component and its “adjacent” neighbors in a grid setting [66]. However, real-world structural systems are typically too complex to be arranged in a grid format, and the “adjacent” neighbors in the grid may not be adjacent to the actual structure of interest. Additionally, determining the optimal kernel size and stride of the convolution is challenging. The GRU module processes the features of each component in sequential order, resulting in variable outcomes each time and lower efficiency than that of the other modules. In contrast, the HGIN module fully utilizes the physical connections among the different components of structural systems, and the Mechformer is better adapted to mechanical computations than the vanilla transformer is, thereby attaining the highest accuracy.
6.3. Potential applications
We have demonstrated the effectiveness of StructGNN-N for conducting nonlinear analyses on structural systems. The training and evaluation processes of StructGNN-N align with the task of building digital twins of structural systems in engineering applications [67].
To construct a digital twin, information about different components within the target structural system is collected throughout the entire life cycle of the structure via the virtual sensing technique. This physical information, including mechanical responses induced during the service life, is critically associated with safety assessments [68,69]. As shown in Fig. 10, a virtual model is updated to reflect the corresponding real entity on the basis of the collected information, and adjustments are made to manage the physical entity according to the discoveries made by the virtual model. This process dynamically maintains the circular flow of data between the virtual model and the physical entity. Owing to its powerful simulation capabilities, the fully trained StructGNN-N approach can be integrated into the digital twin paradigm with conventional simulation platforms to process real-world mechanical data and quickly obtain nonlinear analysis results. Furthermore, even when limited data is acquired from the target structural system, which is often the case since collecting comprehensive information is a time-consuming and costly process during the service period, the fully trained StructGNN-N model can provide a timely and reasonable assessment of the entire structure for decision-making process.
7. Conclusions
In this paper, a novel method based on HetG DL is proposed to address the task of efficiently conducting nonlinear analyses on structural systems. The model trained based on the proposed method can be applied to different structures with diverse designs and predict the full-range history-dependent responses of these structure. The main contributions of this study are summarized as follows.
(1) We design a universal graph representation scheme for structural systems. HetG data comprehensively integrate information about different components within the target structural system and characterize the connectivity relationships among them.
(2) We propose a composite feature learning method to realize end-to-end nonlinear analyses of structural systems. An HGNN is introduced for encoding the internal structural information, and the Seq2Seq framework is recommended for predicting history-dependent full-range nonlinear responses.
(3) We implement HGIN and Mechformer models based on the proposed method, forming a structural computational model named StructGNN-N. This model distinguishes between nodes with similar local topologies and extracts the global historical dependencies underlying <s, r> sequences with linear complexity.
(4) We design a masked response-based training strategy for StructGNN-N. By masking the information of part of the target structural system, this from-part-to-whole paradigm enables significant data augmentations, greatly mitigating the data scarcity issue at the structural system level.
(5) Numerical experiments validate the effectiveness of the proposed method. The StructGNN-N model successfully generalizes to different seismic stimuli and structures, efficiently predicting the nonlinear response histories at all points within the structures.
(6) A parameter analysis and an ablation study demonstrate the validity of the architectural design of StructGNN-N. Its performance showcases its great potential for use in the context of digital twins, providing an inspiring path for simulating diverse engineering structures with accurate and comprehensive mechanical information in real time.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
KurowskiPM. Finite element analysis for design engineers. 3rd ed. Pittsburgh: SAE International; 2022.
[2]
SzabóB, BabuškaI. Finite element analysis: method, verification and validation. 2nd ed. Hoboken: John Wiley & Sons; 2021.
[3]
BelytschkoT, LiuWK, MoranB, ElkhodaryK. Nonlinear finite elements for continua and structures. Hoboken: John Wiley & Sons; 2014.
[4]
WangC, SongL, YuanZ, FanJ. State-of-the-art AI-based computational analysis in civil engineering. J Ind Inf Integr2023; 3:100470.
[5]
BojeC, GuerrieroA, KubickiS, RezguiY. Towards a semantic construction digital twin: directions for future research. Autom Constr2020; 114:103179.
[6]
LeCunY, BengioY, HintonG. Deep learning. Nature2015; 521(7553):436-44.
[7]
BengioY, GoodfellowI, CourvilleA. Deep learning. Cambridge: MIT press; 2017.
[8]
BishopCM, BishopH. Deep learning:foundations and concepts. Cham: Springer International Publishing; 2024.
[9]
SiddiqueN, AdeliH. Computational intelligence: synergies of fuzzy logic, neural networks and evolutionary computing. Hoboken: John Wiley & Sons; 2013.
[10]
VadyalaSR, BetgeriSN, MatthewsJC, MatthewsE. A review of physics-based machine learning in civil engineering. Result Eng2022; 13:100316.
[11]
TapehATG, NaserMZ. Artificial intelligence, machine learning, and deep learning in structural engineering: a scientometrics review of trends and best practices. Arch Comput Meth Eng2023; 30(1):115-59.
[12]
TaoF, LiuX, DuH, YuW. Finite element coupled positive definite deep neural networks mechanics system for constitutive modeling of composites. Comput Meth in Appl Mech Eng2022; 391:114548.
[13]
RahmanJ, AhmedKS, KhanNI, IslamK, MangalathuS. Data-driven shear strength prediction of steel fiber reinforced concrete beams using machine learning approach. Eng Struct2021; 233:111743.
[14]
AbambresM, LantsoghtEOL. Neural network-based formula for shear capacity prediction of one-way slabs under concentrated loads. Eng Struct2020; 211:110501.
[15]
FengDC, CetinerB, Azadi KakavandMR, TacirogluE. Data-driven approach to predict the plastic hinge length of reinforced concrete columns and its application. J Struct Eng2021; 147(2):04020332.
[16]
HuangP, ChenZ, LiuZ. Nonparametric probabilistic seismic demand model and fragility analysis of subway stations using deep learning techniques. Undergr Space2023; 11:63-80.
[17]
NingC, XieY, SunL. LSTM, WaveNet, and 2D CNN for nonlinear time history prediction of seismic responses. Eng Struct2023; 286:116083.
FeiY, LiaoW, ZhaoP, LuX, GuanH. Hybrid surrogate model combining physics and data for seismic drift estimation of shear-wall structures. Earthq Eng Struct Dyn2024; 53(10):3093-112.
[20]
SongLH, WangC, FanJS, LuHM. Elastic structural analysis based on graph neural network without labeled data. Compos A Appl Sci Manuf2023; 38 (10):1307-23.
[21]
WangX, BoD, ShiC, FanS, YeY, YuPS. A survey on heterogeneous graph embedding: methods, techniques, applications and sources. IEEE Trans Big Data2022; 9(2):415-36.
[22]
ZhaoJ, WangX, ShiC, HuB, SongG, YeY. Heterogeneous graph structure learning for graph neural networks. Proc Int AAAI Conf2021; 35 (5):4697-705.
FanS, LiuG, LiJ. A heterogeneous graph neural network with attribute enhancement and structure-aware attention. IEEE Trans Comput Soc2023; 11 (1):829-38.
MaY, YanN, LiJ, MortazaviM, ChawlaNV. HetGPT:harnessing the power of prompt tuning in pre-trained heterogeneous graph neural networks. In:Proceedings of the ACM on Web Conference 2024; 2024 May 13-17; Singapore, Singapore. New York City: Association for Computing Machinery; 2024. p. 1015-23.
[30]
ZhaoY, LiH, ZhouH, AttarHR, PfaffT, LiN. A review of graph neural network applications in mechanics-related domains. Artif Intell Rev2024; 57 (11):315.
[31]
BlackN, NajafiAR. Learning finite element convergence with the multi-fidelity graph neural network. Comput Meth Appl Mech Eng2022; 397:115120.
[32]
LiQ, WangZ, LiL, HaoH, ChenW, ShaoY. Machine learning prediction of structural dynamic responses using graph neural networks. Comput Struct2023; 289:107188.
[33]
XueT, AdriaenssensS, MaoS. Learning the nonlinear dynamics of mechanical metamaterials with graph networks. Int J Mech Sci2023; 238:107835.
[34]
YacoutiM, ShakibaM. Integrated convolutional and graph neural networks for predicting mechanical fields in composite microstructures. Compos A Appl Sci Manuf2025; 190:108618.
[35]
HendriksF, MenkovskiV, DoškářM, GeersMGD, RokošO. Similarity equivariant graph neural networks for homogenization of metamaterials. Compos a Appl Sci Manuf2025; 439:117867.
[36]
DeshpandeS, BordasS, LengiewiczJ. Magnet: a graph u-net architecture for mesh-based simulations. 2022. arXiv:2211.00713.
[37]
TianY, GuanX, SunH, BaoY. An adaptive structural dominant failure modes searching method based on graph neural network. Reliab Eng Syst Saf2024; 243:109841.
[38]
ChenQ, CaoJ, LinW, ZhuS, WangS. Predicting dynamic responses of continuous deformable bodies: a graph-based learning approach. Compos A Appl Sci Manuf2024; 420:116669.
[39]
ZhangC, TaoMX, WangC, FanJS. End-to-end generation of structural topology for complex architectural layouts with graph neural networks. Compos A Appl Sci Manuf2024; 39(5):756-75.
[40]
ZhangC, TaoM, WangC, YangC, FanJ. Differentiable automatic structural optimization using graph deep learning. Adv Eng Inform2024; 60:102363.
[41]
ZhaoP, LiaoW, HuangY, LuX. Beam layout design of shear wall structures based on graph neural networks. Autom Constr2024; 158:105223.
[42]
FeiY, LiaoW, LuX, GuanH. Knowledge-enhanced graph neural networks for construction material quantity estimation of reinforced concrete buildings. Compos A Appl Sci Manuf2024; 39(4):518-38.
[43]
ZhangP, ZhaoH, ShaoZ, XieX, HuH, ZengY, et al. Enhanced multi-scenario running safety assessment of railway bridges based on graph neural networks with self-evolutionary capability. Eng Struct2024; 319:118785.
[44]
WuZ, PanS, ChenF, LongG, ZhangC, YuPS. A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst2021; 32(1):4-24.
[45]
ZhouJ, CuiG, HuS, ZhangZ, YangC, LiuZ, et al. Graph neural networks: a review of methods and applications. AI open2020; 1:57-81.
[46]
WangC, SongL, FanJ. End-to-end structural analysis in civil engineering based on deep learning. Autom Constr2022; 138:104255.
[47]
WangC, XuL, FanJ. A general deep learning framework for history-dependent response prediction based on UA-Seq2Seq model. Comput Meth Appl Mech Eng2020; 372:113357.
[48]
YousufH, LahziM, SalloumSA, ShaalanK. A systematic review on sequence-to-sequence learning with neural network and its models. Int J Electr Comput Eng2021; 11(3):2315-26.
[49]
SherstinskyA. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys D2020; 404:132306.
[50]
XuL, NieX, FanJ, TaoM, DingR. Cyclic hardening and softening behavior of the low yield point steel BLY160: experimental response and constitutive modeling. Int J Plast2016; 78:44-63.
[51]
SutskeverI, VinyalsO, LeQV. Sequence to sequence learning with neural networks. 2014. arXiv:1409.3215.
[52]
VaswaniA, ShazeerN, ParmarN, UszkoreitJ, JonesL, GomezAN, et al. Attention is all you need. 2017. arXiv:1706.03762.
[53]
XuK, HuW, LeskovecJ, JegelkaS. How powerful are graph neural networks? 2018. arXiv:1810.00826.
[54]
BaltrušaitisT, AhujaC, MorencyLP. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell2019; 41(2):423-43.
[55]
ChoromanskiK, LikhosherstovV, DohanD, SongX, GaneA, SarlosT, et al. Rethinking attention with performers. 2020. arXiv:2009.14794.
[56]
YoonS. Virtual sensing in intelligent buildings and digitalization. Autom Constr2022; 143:104578.
[57]
Pacific Earthquake Engineering Research Center. PEER ground motion database [Internet]. Berkeley: University of California; 2024 Jan 1 [2024 Feb 5].Available from: https://ngawest2.berkeley.edu
[58]
JavanbakhtZ, ÖchsnerA. Advanced finite element simulation with MSC Marc. Cham: Springer International Publishing; 2017.
[59]
ÖchsnerA, ÖchsnerM. A first introduction to the finite element analysis program MSC Marc/Mentat. Cham: Springer International Publishing; 2018.