Accurate origin-destination (OD) demand prediction is crucial for the efficient operation and management of urban rail transit (URT) systems, particularly during a pandemic. However, this task faces several limitations, including real-time availability, sparsity, and high-dimensionality issues, and the impact of the pandemic. Consequently, this study proposes a unified framework called the physics-guided adaptive graph spatial-temporal attention network (PAG-STAN) for metro OD demand prediction under pandemic conditions. Specifically, PAG-STAN introduces a real-time OD estimation module to estimate real-time complete OD demand matrices. Subsequently, a novel dynamic OD demand matrix compression module is proposed to generate dense real-time OD demand matrices. Thereafter, PAG-STAN leverages various heterogeneous data to learn the evolutionary trend of future OD ridership during the pandemic. Finally, a masked physics-guided loss function (MPG-loss function) incorporates the physical quantity information between the OD demand and inbound flow into the loss function to enhance model interpretability. PAG-STAN demonstrated favorable performance on two real-world metro OD demand datasets under the pandemic and conventional scenarios, highlighting its robustness and sensitivity for metro OD demand prediction. A series of ablation studies were conducted to verify the indispensability of each module in PAG-STAN.
With the rapid development of the social economy, urban rail transit (URT) has been under large-scale construction and operates in some countries to meet increasing travel demands. The accurate prediction of origin-destination (OD) demand is crucial for the efficient and safe operation of URT systems. This is because it can provide a detailed profile of where and when passengers enter and exit URT systems, which is critical for both operators and passengers. Accurate OD demand prediction is beneficial for operators to adjust the subway operating scheme in a timely manner, thereby improving operation efficiency. Furthermore, it can assist passengers in scheduling their routes in advance, thereby reducing waiting times and improving travel experiences. However, compared with inflow/outflow prediction tasks, the OD demand prediction task is challenging. This is because it forecasts the inflow/outflow of the target station and requires consideration of the spatial-temporal distribution of passenger flow throughout the URT network. Moreover, the onset of a pandemic introduces substantial challenges to the task of metro OD demand prediction. After the pandemic, individuals are expected to consider additional security factors during travel, leading to increased randomness and abrupt changes in OD demand [1], [2]. Consequently, URT system operators may struggle to adjust train operation schedules in a timely manner, resulting in either excessive or insufficient transportation capacity. Therefore, accurate metro OD demand forecasting, particularly during a pandemic, is a critical yet challenging task that may encounter the following issues:
(1) OD demand data real-time availability: In metro passenger flow prediction studies [3], [4], [5], real-time ridership data can be obtained through the automatic fare collection (AFC) system, enabling the real-time passenger flow prediction. However, in metro OD demand prediction studies, each OD demand represents a travel behavior that can only be obtained after a passenger has completed its journey [6], [7]. For instance, when predicting the OD demand during a specific time window, such as 08:00-08:30, a complete OD demand of 07:30-08:00 cannot be obtained because some passengers may not have completed their travel. The incomplete OD demand information will reduce the accuracy of OD demand prediction. Therefore, obtaining nearly complete real-time OD demand information is crucial for predicting OD demand.
(2) OD demand data sparsity: Owing to the different functions of different areas in the city, there is significantly sparse OD demand among different areas. For instance, there is typically a high OD demand between residential areas and central business districts, whereas remote suburbs may exhibit low or no OD demand. Fig. 1 depicts the distribution of OD numbers at different time intervals in a single day of the Nanning metro, with a large majority of instances featuring low or zero OD demands. These low or zero OD demands make the OD demand matrix significantly sparse, creating challenges in effectively capturing the spatial-temporal distribution of OD demand during training [8], [9]. Moreover, Fig. 2 visualizes the OD demand matrix at different stages of coronavirus disease 2019 (COVID-19) pandemic. It can be observed that the sparsity of the OD demand matrix during COVID-19 increased significantly, owing to concerns that taking the subway would increase the risk of infection. Therefore, effectively addressing the sparsity of the OD demand matrix is critical for accurate metro OD demand prediction, particularly during a pandemic.
(3) OD demand data high-dimensionality: The dimension of the OD demand matrix is , which is considerably larger than that of the inflow/outflow matrix, where denotes the station number of the URT systems [8], [10]. For instance, in the case of Nanning metro, which comprises 62 metro stations, predicting over three thousand OD demands required for the next time interval results in high computational complexity. Therefore, the OD high-dimensionality issue should be addressed.
(4) Impact of external factors during pandemics: The onset of a pandemic cause sudden and irregular changes in URT system OD demand, posing challenges to metro OD demand prediction. Considering COVID-19 as an example, Fig. 2 illustrates the pandemic’s impact on passenger flow. Before the outbreak, the OD demand was significantly higher than that during the pandemic, when it nearly approached zero. This change was attributed to the increased transmission risk and infection probability associated with the use of urban public transportation during COVID-19. The irregular OD demand evolution during COVID-19 challenges the existing models, which are limited in capturing the impact of multiple factors on metro OD demand. Therefore, accurate modeling of OD demand evolution during pandemics requires a comprehensive understanding of the impact of external factors on demand distribution.
Recently, short-term OD traffic demand predictions have attracted increasing attention from researchers. Since Liu et al. [11] first proposed a deep learning-based model (contextualized spatial–temporal network (CSTN)) for taxi OD demand prediction, several scholars have conducted in-depth studies on OD demand prediction. For example, Noursalehi et al. [7] proposed a multiresolution spatial–temporal deep learning approach for metro OD demand prediction. Zou et al. [12] learned the spatiotemporal correlation of OD pairs from the origin and destination views separately to predict long-term OD demand. These methods address the gaps in traffic OD demand prediction; however, research on traffic OD demand prediction during pandemics is limited. Additionally, they appear to overlook a significant correlation between OD demand and inbound flow. Specifically, for each station, the inbound flow during a certain period equals the sum of the OD demand departing from the station during that period. Fig. 3 illustrates the quantity relationship between OD demand and inbound flow at a specific period. The demand for all OD pairs with Station as the origin station is equal to the inbound flow of Station in the current period, which is 41. Understanding the relationship between the OD demand and inbound flow provides valuable information for OD demand prediction. Therefore, considering this physical quantity relationship may enhance the model interpretability and prediction accuracy.
This study addresses short-term metro OD demand prediction during a pandemic by introducing the physics guided adaptive graph spatial-temporal attention network (PAG-STAN). As aforementioned, raw real-time metro OD demand data always suffer from real-time availability, sparsity, and high-dimensionality issues. Thus, PAG-STAN utilizes the real-time inbound flow and long-term historical distribution rates to approximate a complete real-time OD demand matrix. Subsequently, a novel dynamic OD demand matrix compression module is proposed to consider partial high-demand OD pairs for dense OD demand matrix generation. This model applies an encoder-decoder architecture to capture the spatial-temporal dependencies of metro OD demand during the pandemic. In the encoder, an adaptive graph convolution long short-term memory (AGC-LSTM) and a multi-periodic cross-attention mechanism (MPC-ATTN) were introduced to capture the periodicity spatial-temporal distribution information of the OD demand. In the decoder, several bidirectional LSTMs (BiLSTMs) are utilized to propagate useful contextual messages for decoding the global distribution information. A heterogeneous information fusion block (HIFB) incorporates heterogeneous data (pandemic-related data, date-attributed data, etc.) to study the impact of external factors on OD demand during a pandemic. Moreover, a new masked physics-guided (MPG)-loss function is proposed to embed the physics-quantity relationship between the OD demand and inbound flow into the loss function, thereby enhancing the training efficiency and model interpretability. Finally, the authors conducted extensive experiments on two real-world metro OD demand dataset under different scenarios, and the evaluation results showed that PAG-STAN outperformed existing methods for metro OD prediction under pandemic or conventional scenarios. Therefore, the contributions of this study are as follows:
• The authors propose a novel PAG-STAN to boost the short-term metro OD demand prediction under pandemic, which effectively address challenges, such as data real-time availability issue, sparsity issue, high-dimensionality issue, and pandemic impact of metro OD demand prediction in the context of pandemic.
• PAG-STAN introduces an innovative dynamic OD demand matrix compression method to flexibly select the demand feature information for each origin station, generating a dense OD demand matrix. This approach preserves crucial OD distribution information as well as addresses the challenges of sparsity and high dimensionality.
• To accurately predict the metro OD demand under pandemic, PAG-STAN leverages heterogeneous data (multi-periodic OD demand data, real-time inbound flow data, pandemic-related data, and date attribute data, etc.) to fully explore the OD distribution information. Several modules in PAG-STAN, such as AGC-LSTM, MPC-ATTN, BiLSTMs, and HIFB, are introduced to better capture the spatial-temporal dependencies of metro OD demand under pandemic.
• A novel MPG-loss function is introduced to enhance model training by incorporating physical quantity information between OD demand and inbound flow. This addition guides model training, maintaining accurate prediction capabilities while improving interpretability.
• Extensive experiments conducted on two real-world metro OD demand dataset in different scenarios show the superiority of PAG-STAN in metro OD demand prediction during pandemic or conventional scenarios. Meanwhile, several ablation studies are conducted to verify the indispensability of each module.
2. Literature review
Short-term traffic flow prediction of urban public transportation systems plays a critical role in improving service quality and operating efficiency [6]. Practically, this comprises two main parts: ridership prediction and OD demand prediction. The former provides operators with traffic flow information entering or leaving stations/regions, whereas the latter provides operators with spatiotemporal distribution information of traffic flow across the traffic network. The following subsections review the relevant studies on short-term ridership and OD demand prediction.
2.1. Short-term inflow/outflow prediction
Owing to its significant potential application in intelligent transportation system, the short-term inflow/outflow prediction have been extensively studied using two main types of methods: traditional statistical methods and machine-learning based methods. In the early studies, most scholars applied statistical approaches to traffic flow prediction. Van der Voort et al. [13] used Kohonen maps to classify traffic flows and adopted an autoregressive integrated moving average (ARIMA) to predict each class. Tan et al. [14] proposed an aggregation method based on ARIMA and exponential smoothing for short-term traffic flow prediction. Ni et al. [15] introduced a linear regression model using seasonal ARIMA to learn how subway passenger flows change during event occurrences. However, these statistical approaches treat traffic flow as a stationary time series, neglecting its volatility and complexity; thus, they cannot capture the complex spatiotemporal features of traffic flow.
The recent rapid development of machine learning has effectively addressed the limitations of statistical models by employing methods, such as support vector regression (SVR), K-nearest neighbor (KNN), and linear regression [16], [17], [18], [19]. For example, Li et al. [20] applied SVR with a Gaussian loss function to reduce random errors in urban traffic flow data sequences. Hong [21] utilized seasonal SVR and a chaotic simulated annealing algorithm to handle inter-urban traffic flows with a cyclic (seasonal) trend. Cai et al. [22] proposed an improved KNN model to enhance forecasting accuracy based on spatiotemporal correlations among spatiotemporal state matrices. Lin et al. [23] transformed a spatial time-delayed traffic series into traffic state vectors and applied a combination of SVR and KNN to predict the traffic flow. Avila and Mezić [24] utilized Koopman mode decomposition to analyze and forecast traffic dynamics, which can effectively distinguish any growing or decaying patterns and obtain a hierarchy of previously and never-identified temporal patterns. These machine learning-based methods consider the complex temporal dependencies of traffic flow; thus, improving the prediction performance. However, these models cannot effectively capture spatial dependencies and can only predict the flow for individual stations, making them unsuitable for network-wide traffic flow prediction.
As an emerging branch of machine learning, deep learning-based methods have been widely applied in various fields [25], [26], [27], [28], motivating researchers to adopt deep learning to handle traffic flow prediction tasks. Some studies [29], [30] have applied recurrent neural networks (RNN), such as LSTM and gate recurrent unit (GRU) to model the temporal dependencies of traffic flow, whereas others [31], [32] have adopted convolutional neural networks (CNNs) to model the spatial dependencies of traffic flow. For instance, Li et al. [3] modeled traffic flow as a diffusion process on a directed graph and proposed a diffusion convolutional RNN (DCRNN) to incorporate spatial and temporal dependencies for traffic flow prediction. Zhang et al. [4] introduced a novel deep learning-based approach, spatial-temporal residual network (ST-ResNet), to predict citywide crowd flows and designed residual convolutional units to model spatial properties and temporal periodicity. Considering the irregular properties of URT passenger flow, Du et al. [33] modeled passenger flow into multichannel matrices and proposed a deep irregular convolutional residual network to learn the spatial-temporal feature representations. However, traffic flow prediction is not a simple time-series processing problem, because it involves complicated spatial traffic networks [34].
Because traffic networks are essentially graph structures with complex topological features, an increasing number of studies have adopted graph convolution networks (GCNs) to capture the spatial dependence of traffic flows. Yu et al. [35] regarded a traffic network as a general graph and developed a spatiotemporal graph convolutional network for traffic forecasting tasks. Zhao et al. [5] proposed a novel T-GCN that combined a GCN and a GRU to simultaneously capture spatial and temporal dependencies. However, because a single static graph cannot fully reflect complex and dynamic spatial dependencies, some researchers [36], [37], [38] extended GCNs to model multiple spatial features. Peng et al. [39] modeled a traffic network using dynamic traffic flow probability graphs and used GCNs to learn the spatial dependencies of the traffic flow. Wu et al. [40] developed a novel self-adaptive dependency matrix to automatically explore the hidden spatial dependencies of traffic flows. Wang et al. [41] constructed a hypergraph based on metro network topology and travel patterns to represent the high-order spatial relationships of passenger flow. Furthermore, owing to their superiority in modeling global temporal dependencies from multiple latent subspaces, transformer-based methods [42], [43], [44] have attracted considerable attention in traffic flow prediction tasks. For example, Yan et al. [45] developed a novel traffic transformer to model the dynamic and hierarchical spatiotemporal features of traffic flows. Xie et al. [46] considered the nonlinear spatiotemporal dependencies of crowd flow and proposed a novel multi-size patched spatiotemporal transformer network to incorporate the cross-space-time and cross-size contextual dependence of crowd data. Xu et al. [47] developed a novel metagraph transformer to address the challenge of modeling complex traffic flow dynamics. All the methods discussed above were designed to predict short-term inflow/outflow at specific regions/stations. They only reflect the entering/leaving information of each region/station but cannot fully describe the spatiotemporal distribution of traffic flow in a traffic network. To further study the spatial-temporal mobility patterns of traffic flow from region to region to station to station, some scholars have recently conducted OD demand predictions.
2.2. Short-term OD demand prediction
OD demand refers to the traffic flow between origin and destination during a specific period, which directly reflects the spatial-temporal distribution of traffic flow between regions/stations. In 2019, Liu et al. [11] developed a contextualized spatiotemporal network to predict taxi OD demand between urban regions, which was the first attempt to address the problem of OD demand prediction. An increasing number of studies have proposed deep learning-based methods to address OD demand prediction. For example, Chu et al. [48] proposed a novel multiscale convolutional LSTM (ConvLSTM) to predict taxi demands at multiple scales. Hu et al. [49] modeled travel cost as a distribution and proposed a stochastic OD matrix forecasting problem. To address the problem of missing spatial OD flows, Yao et al. [50] proposed a spatial interaction graph convolutional network for imputing spatial OD data. Zou et al. [12] developed a graph deep learning (GDL) model, ST-GDL, to explore the complex and dynamic spatial-temporal correlations of time-varying OD information. Ke et al. [51] regarded OD pairs as nodes to capture various hidden spatiotemporal relationships between OD pairs, and developed a spatial-temporal encoder-decoder residual multi-graph convolutional network (ST-ED-RMGC) to predict taxi OD demand. Huang et al. [52] calculated the pre-predicted results to provide the reference information of future demand for multi-steps prediction, and incorporated the pre-predicted results and historical demand into bidirectional attention mechanism to explore the spatiotemporal correlations of ride-hailing demand among past, present and future information. Subsequently, they exploited the GAN structure to overcome the high sparsity of ride-hailing demand [53].
All the methods discussed above were proposed for taxi or ride-hailing demand prediction tasks, which do not need to consider the real-time availability of OD demand data. However, when conducting metro OD demand prediction, real-time OD demand can only be obtained after travel has finished, which poses significant challenges for the prediction. Recently, scholars have attempted to address this problem in metro OD demand predictions. For example, to overcome the problem of partial observability of OD flow information, Jiang et al. [6] proposed a novel OD flow reconstruction mechanism to deduce the complete OD flow. Zhang et al. [8] focused on OD data availability in OD demand prediction, and introduced a channel-wise attentive-split CNN to address these limitations. Noursalehi et al. [7] proposed a novel multiresolution spatial-temporal neural network to overcome the real-time data availability issue, which utilizes the exit-based real-time OD demand to predict the future complete OD demand data. Considering the issues of high dimensionality and incomplete real-time OD information, Zhu et al. [10] developed a two-stage OD demand prediction method to predict the inbound flow and separation rate, and the future OD demand was predicted by multiplying them. Liu et al. [9] solved the issue of real-time availability of OD demand data by jointly exploiting multiple historical datasets.
Table 1 [6], [7], [8], [9], [10], [11], [12], [48], [49], [50], [51], [52], [53] summarizes the primary methods used for taxi and metro OD demand predictions. Although these methods have achieved acceptable prediction performances, they have several limitations. First, neither fully considers the issues raised above, such as real-time availability, sparsity, high-dimensionality, and the ridership quantity relationship. Second, all of these methods focus solely on conventional scenarios, disregarding the need to predict OD demand under disrupted scenarios (e.g., pandemic). Disrupted scenarios may cause sudden and irregular changes in the OD demand distribution, leading to capacity bottlenecks, such as capacity insufficiency or waste. Therefore, operators must pay more attention to OD demand predictions in disrupted scenarios.
This study focused on the OD demand prediction of URT systems during pandemics. It aims to propose a unified framework that can effectively address the issues raised above, as well as serve as a reference for OD demand prediction under disrupted scenarios.
3. Preliminaries
In this section, several fundamental concepts are first defined to formulate the OD demand prediction problem in URT systems under pandemic.
Definition 1 (URT network): This study focuses on OD demand prediction in URT systems. To represent the spatial correlation of each station, the URT network is defined as , where is the set of stations, and is the number of stations in the URT network. denotes a set of edges, where represents positive integers, and is a binary variable indicating whether the station and station are adjacent in the physical network. denotes the weighted adjacency matrix of URT network, where represents real numbers, and is obtained by applying linear normalization to each row of . This representation allowed us to capture the spatial relationships between stations and effectively incorporate this information into our OD demand prediction model.
Definition 2 (inbound flow matrix): Considering the metro AFC record, the historical inbound flow series can be extracted and integrated into the inbound flow matrix at different time granularities (e.g., 10 and 60 minutes). Let denote the inbound flow of station at time interval , and let be defined as an inbound flow vector representing the inbound flow of all stations at time interval . The inbound flow matrix containing the inbound information of past time steps at time interval can be defined as , which can be formulated as follows.
Definition 3 (OD demand matrix): Let denote the complete OD demand from station to station during time interval , where . The complete OD demand matrix is defined to represent all OD demand among the all stations during the time interval , which can be formulated as follows.
Notably, a physics–quantity relationship exists between the metro OD demand and the inbound flow of all stations during a unit period, which can be formulated as follows:
Definition 4 (multiple time-series OD demand): This study considers the historical multiple time-series OD demand over time, namely, the weekly and daily time-series OD, to study the long short-term historical periodic distribution of OD demand. Specifically, suppose that the current time interval is and the total number of time intervals in a day is denoted . Thus, the weekly time-series OD refers to the OD demand during the same period in the last week. The daily time-series OD refers to the OD demand in the same time interval as the previous day.
Problem: Considering the observed OD demand and inbound flow data at time interval , this study aims to learn the function to predict the complete OD demand in URT systems for the next time interval .
4. Methodology
To accurately predict the OD demand of URT systems under pandemic, a novel PAG-STAN is proposed to utilize various heterogeneous data (i.e., multiple historical time-series OD demand, real-time inflow, pandemic-related data, and date attribute data) to model the spatial-temporal evolution pattern of OD demand under pandemic. Fig. 4 demonstrates the architecture of PAG-STAN, which consists of five modules: real-time OD estimation, dynamic OD demand matrix compression, encoder, decoder, and MPG-loss function.
Real-time OD estimation: To address the issue of real-time data availability, this module estimates the real-time OD demand by multiplying the real-time inbound data with the historical long short-term OD distribution rate, which fully incorporates historical weekly/daily OD distribution information.
Dynamic OD demand matrix compression: The sparsity and high dimensionality of the raw OD demand matrix pose significant challenges for model prediction. To address these issues, an innovative dynamic OD demand matrix compression block is proposed for selecting partial OD pairs with high demand to generate a dense OD demand matrix. This module reduces the sparsity and high-dimensionality of the OD demand matrix, while retaining fully OD distribution information.
Encoder: In the encoder, AGC-LSTM and MPC-ATTN are proposed to study the hidden periodic spatial-temporal information of OD demand. To capture the significant periodicity of the OD demand, weekly time-series OD, daily time-series OD, and real-time OD demand data were introduced into PAG-STAN. These multiple time-series OD demand data were fed into three AGC-LSTM branches to explore the hidden spatial-temporal distribution information of each time-series OD demand data. Subsequently, MPC-ATTN is utilized to generate the feature representation by encoding the hidden periodic spatial-temporal distribution information. Finally, the feature representations are fed into a convolution network with residual operations to enhance the periodic spatial-temporal distribution feature representations.
Decoder: Several BiLSTMs are applied in the decoder to decode the enhanced periodic spatiotemporal distribution feature representations from both the forward and backward directions. Moreover, to study the impact of external factors on metro OD demand during the pandemic, the HIFB was introduced to fuse the OD demand data with various external data (pandemic-related data and date attribute data). It effectively learns meaningful information to further learn the evolutionary features of the OD demand. Finally, by stacking several fully connected layers, the model predicts a completely dense OD demand matrix for the next time interval.
MPG-loss function: A novel MPG-loss function is proposed to guide model training by considering the physics-quantity relationship between inbound flow and OD demand. This approach maintains the advantage of accurate prediction in data-driven models, while improving the interpretability of OD demand prediction models. In addition, the MPG-loss function performs a masking operation to eliminate the training interference caused by the filled zero values in the dynamic compressed OD demand matrix.
4.1. Real-time OD estimation
As some travels have not been finished within the current period, the real-time OD demands extracted from AFC records are often incomplete. To address this issue, the real-time OD estimation module uses real-time inflow data and historical destination distribution rates to approximate a nearly complete OD demand matrix. This is because the real-time inflow data encompass comprehensive real-time inbound information for each origin station, and the historical destination distribution rates encapsulate meaningful periodic distribution information of OD demand.
Fig. 5 illustrates the process of real-time OD estimation. The historical long short-term destination distribution rates denote the destination distribution at the corresponding time in the previous week and the destination distribution simultaneously on the previous day, which can reflect week-to-week/day-to-day demand fluctuations for individual OD pairs because of changes in their trip generation/attractions for different trip activities [54]. Both are used to estimate long-term/short-term potential OD demand matrices at time interval . Considering the historical long-term/short-term destination distribution rates and the real-time inflow of the target origin station, these potential OD demand matrices can be calculated as follows:
After obtaining these two potential OD demand matrices, they were concatenated and fed into a convolution unit to generate a nearly complete real-time OD demand matrix with a periodic OD distribution.
4.2. Dynamic OD demand matrix compression
Owing to the sparsity and high dimensionality of the raw OD demand matrix, predicting OD demand faces substantial challenges. Contrarily, the sparse OD demand matrix interferes with modeling the spatiotemporal distribution of OD demand. However, this model incurs high computational costs when dealing with high-dimensional OD demand matrices [55]. Therefore, a dynamic OD demand matrix compression module was proposed to select partial high-demand OD pairs to generate dense OD demand matrices. As shown in Fig. 6, for each origin station , the destination distribution rates of all destination stations are first calculated.
Destination stations are ranked by the destination distribution rates, and stations with high destination distribution rates are selected based on a preset passenger flow proportion threshold (PFP). Note that decision makers can preset the threshold PFP to control the sparsity of the resulting compressed matrix. In this study, we set this threshold to 70% for metro OD demand prediction. Subsequent experiments will investigate the prediction performance of the model under different sparsity levels by varying PFP. The destination station filtering process can be approximated using the following minimization problem.where denotes the number of selected destination stations, and is the number of the OD features for origin station , it assumes different values for different origin stations. After filtering the high-demand destination stations, the OD demand between the target origin station and destination stations is regarded as OD features. The sum of the OD demands of the remaining stations is regarded as the OD feature. As the number of OD features for different origin stations varied, the authors utilized to fill the vacancies of the OD feature vector, ensuring that the dimension of each OD feature vector equals , where . Thus, a dense OD matrix of time interval is generated as . Notably, because the filled zero interferes with the prediction, it is masked to prevent errors from being backpropagated throughout the model training. Therefore, this module enabled us to reduce the sparsity and high-dimensionality of the OD demand matrix while retaining OD distribution information.
4.3. Encoder
After obtaining the dense OD demand matrix, a general encoder-decoder framework is proposed. The encoder aims to explore the hidden periodic spatial-temporal distribution information from multiple time-series OD demand. Therefore, a new AGC-LSTM and novel MPC-ATTN are proposed to integrate multiple time-series OD demands for encoding inherent periodic spatial-temporal distribution information.
4.3.1. AGC-LSTM
The URT system is essentially a dynamic traffic system with a complex spatial topology, thus using GCNs to capture the complex spatial dependencies of traffic flow may be a reasonable way [5], [39], [56]. Owing to the simplicity and efficiency of the graph convolution operation proposed by Kipf and Welling [57], the authors adopted this version of GCN to conduct graph convolution.where denotes the hidden features of the layer. denotes an activation function, such as or . denotes the adjacency matrix with self-connections added to maintain the information of the node in the convolution operation, where denotes the identity matrix. is the degree matrix in which and are the trainable weight matrices of the graph convolution layer.
Although conventional GCNs can effectively model topological spatial features, recent studies [40] found that a predefined static adjacency matrix may limit their performance in traffic flow prediction because the spatial dependencies among traffic nodes may change over time. Consequently, a self-adaptive adjacency matrix is embedded into GCN to dynamically model spatial dependencies in a self-adaptive manner. The adaptive adjacency matrix does not require prior knowledge and can be updated using stochastic gradient descent. Specifically, the source node embedding and the target node embedding are first randomly initialized with learnable parameters to generate a self-adaptive adjacency matrix as follows:
In this process, activation function is applied to ignore weak dependencies, and activation function is used to normalize the adaptive adjacency matrix. In particular, the self-adaptive adjacency matrix supplements the uncertain relationships between nodes (stations) and can dynamically model hidden spatial dependencies during the training process. The adaptive graph convolution operation is computed as follows.
Convolutional LSTM (ConvLSTM) [58] has been proven to be stable and powerful for modeling the long-term spatiotemporal relationships of sequence data in previous studies [59], [60], [61]. However, the conventional convolutional operation in ConvLSTM focuses only on a regular Euclidean structure, and the non-Euclidean spatial information of URT network topology cannot be extracted effectively. To capture the complex spatial topological features and temporal evolution characteristics of the OD demand simultaneously, a novel AGC-LSTM (shown in Fig. 7) is developed by replacing the convolutional operation with the adaptive graph convolution operator in the state-to-state and input-to-state transitions of ConvLSTM, which can be formulated as follows.where denotes the adaptive graph convolution operation, denotes the Hadamard product, denote the relevant weight matrices, and are the relevant biases. denote the sequence inputs, denote the input gates, denote the forget gate, denote the memory cell output, represent the output gates, and implies the hidden states. Because AGC-LSTM adopts an adaptive graph convolution operation to encode the spatial–temporal information, it can determine the future states of the target node by considering the inputs and hidden states of its topological neighbors in a self-adaptive manner.
Owing to AGC-LSTM unit, PAG-STAN effectively learns complex spatial–temporal dependencies in a self-adaptive manner. Specifically, the weekly and daily OD patterns are first fed into two individual AGC-LSTM branches for long short-term periodic hidden-state generation:
The hidden states and are then fused using a convolutional layer to construct the dense periodic hidden state . This dense periodic hidden state is considered as the initial state and fed into an individual AGC-LSTM branch along with the estimated real-time OD pattern, aiming to generate a real-time feature output with long short-term periodic information.
4.3.2. MPC-ATTN
Owing to the periodicity of travel behaviors, there may exist a strong correlation between these multiple OD demand patterns. For example, the weekly OD demand pattern can reflect the regularity of OD distribution on weekdays/weekends in the long term, while the daily OD demand pattern can reflect the fluctuation trend of OD demand in the short term. Multi-head attention enables joint attention to information from different representation subspaces, thereby improving the feature representation ability [62]. To further explore the significant periodic spatiotemporal distribution of the OD demand, a novel MPC-ATTN (shown in Fig. 8) is developed to model the inherent correlation among the multiple OD demand patterns from multiple perspectives. It effectively encodes multiple OD feature outputs from the three AGC-LSTM branches to generate a latent representation.
Specifically, the feature outputs from the three individual AGC-LSTM branches were first projected onto the same dimension as . The linear projection of queries and keys in the conventional attention mechanism is replaced by a convolution operation in MPC-ATTN to consider the local distribution information of the OD demand. The long- and short-term cross-attention scores were calculated to explore the long short-term periodic correlations among multiple feature outputs.where is the correlation weight of long-term periodic distribution information, is the correlation weight of short-term periodic distribution information, and is the dimension of , equal to . Subsequently, the long- and short-term correlation weights are integrated with real-time values by element-wise multiplication to propagate periodic information with high attention weights.
After information propagation, the long-term cross-attention and short-term cross-attention are concatenated and projected onto a convolutional layer to generate global cross-attention with long short-term periodic distribution information.where denotes the concatenation operator for attention scores.
Specifically, this study further uses multi-head attention [62] computation to obtain information from different representation subspaces, which encode the multiple OD feature outputs from AGC-LSTM to generate a latent representation with global long short-term periodic OD distribution information.where is the number of attention heads and is the latent attention weight matrix.
4.4. Decoder
After fully learning the long short-term periodic spatial-temporal distribution information in encoder, several BiLSTMs are utilized to decode these distribution information from both forward and backward directions in decoder. They effectively captured contextual information and learned the global evolution patterns of OD demand. In addition, metro OD demand is easily affected by multiple external factors, such as pandemics and date attributes, making the spatial-temporal OD distribution irregular. Such an irregular OD distribution increases the difficulty of predicting the metro OD demand. To study the impact of external factors on the metro OD demand during pandemics, a novel HIFB was proposed to fuse heterogeneous data sources, including pandemic-related and date attribute data. It can explore the mutual information between external factors and OD demand to enhance the evolution of OD demand over time. Finally, the complete OD demand in the next time interval can be forecasted using several fully connected layers.
4.4.1. BiLSTM
Owing to the uncertainty and abruptness of OD demand evolution under pandemic, one-way modeling methods may not be able to fully capture the global spatial-temporal distribution information of OD demand under pandemic. Therefore, several BiLSTMs are utilized to decode meaningful historical periodic distribution information, which provides a complementary message from the forward and backward directions to fully understand the complex global evolution pattern of OD demand. BiLSTM is an improved version of LSTM that combines forward LSTM and backward LSTM, as shown in Fig. 9. Specifically, BiLSTM first processes the input data in sequential and reverse orders to obtain forward and backward temporal dependencies, respectively. These two temporal dependencies are concatenated and fed into a fully connected layer to obtain the final hidden states.
In particular, the estimated real-time OD demand is first addressed by an individual BiLSTM to capture the decoder sequence. Subsequently, the latent representation from the encoder is regarded as the initial state to be fed into an individual BiLSTM along with the decoder sequence. Finally, by stacking several BiLSTMs, the historical periodic spatial-temporal distribution information can be effectively decoded. Thus, every position in the decoder sequence can be fused with the periodic distribution feature representations from the encoder, which is beneficial for understanding the global evolution pattern of OD demand in the forward and backward directions.
4.4.2. HIFB
During the pandemic, metro OD demand is susceptible to various external factors. However, the pandemic poses a security threat to people’s travel, causing abnormal fluctuations in passenger flow. However, in addition to the pandemic, external factors, such as date attributes can also influence passenger flow dynamics. Therefore, to investigate the impact of external factors on the metro OD demand during the pandemic, this study proposes a novel HIFB to fully aggregate pandemic-related and date attribution data.
Denoting the heterogeneous feature matrix as , where denotes the number of these heterogeneous features and denotes the historical time step, it mainly consists of multiple heterogeneous data (pandemic-related data, date attribute data, etc.) of the time interval . As these heterogeneous data are collected on a daily basis, the heterogeneous feature matrix of the time interval shares the daily record data. Fig. 10 demonstrates the whole process of heterogeneous information fusion. Specifically, the HIFB first uses learned embedding to convert a heterogeneous feature matrix into a latent feature matrix for high-dimensional feature mapping. Subsequently, the latent feature matrix is managed by convolution units, followed by the pooling operation (max-pooling and avg-pooling) along the temporal dimension. Thus, global evolution information during a pandemic can be effectively extracted from heterogeneous data. Subsequently, the meaningful evolution information matrix is fused with the latent feature matrix via element-wise multiplication to obtain the enhanced feature matrix .where denotes the sigmoid function. After obtaining , we apply several fully connected layers to convert the dimensions of to ensure that its dimensions are consistent with those of the dense OD demand matrix . Finally, and were directly fused by element-wise addition, which effectively enhanced the global evolution of OD demand during the pandemic. Therefore, the HIFB module investigated the impact of various types of heterogeneous data on the spatiotemporal distribution of OD demand, offering valuable auxiliary information for modeling irregular OD demand distributions during the pandemic.
4.5. MPG-loss function
As discussed in above, there is an obvious quantity correlation between OD demand and inbound flow in the URT network, which can provide a physics law of the OD distribution evolution process, improving the interpretability of model. Therefore, the ridership quantity laws were embedded into the loss function to effectively guide the model training. Moreover, the value was utilized to fill the vacancy to construct a dynamic compressed OD matrix ; however, these filled values may negatively impact the prediction performance. Therefore, a masking operation was introduced into the loss function to eliminate the training errors caused by filling in the compressed OD matrix. The innovative MPG-loss function is formulated as follows:where and represent the weights of different terms in the loss functions, which are used to balance the trade-offs between different terms and can dynamically adjust the impact of different types of information on model training. indicates the number of OD pairs that are not masked during the back-propagation process, represents the actual OD demand from station to station , denotes the corresponding predicted value, and denotes the inflow of station .
Specifically, the first term of the MPG-loss function assesses the capacity of our model to learn the OD demand distribution features, whereas the second term assesses its capacity to learn the physical laws described by the quantity relationship equations. The MPG-loss function embeds ridership quantity laws into a loss function to back-propagate meaningful physical information to enhance model training. Therefore, the MPG-loss function improved the interpretability of the prediction model while maintaining the accuracy of the data-driven model. Moreover, during the training process, a masking operation is adopted to mask the filled of compressed OD matrix during back-propagation, eliminating training interference owing to these filled zero values caused by OD demand matrix compression.
5. Experiment
In this section, the authors consider the metro OD demand during COVID-19 as example to verify the effectiveness of our model in short-term metro OD demand prediction under pandemic. To further verify the practical replication of PAG-STAN in different contexts, we investigated metro OD demand prediction under daily scenarios using another real-world OD demand dataset.
5.1. Experiment settings
5.1.1. Data description
OD demand data. This section introduced two real-world datasets for short-term metro OD demand prediction, which are constructed by millions of AFC records collected from Nanning metro and Beijing metro, referred to as NNMOD and BJMOD for brevity. NNMOD encompassed the OD demand for all 62 stations in URT network, spanning from January 31, 2020 to April 30, 2020. This period encompasses key stages in the development of COVID-19, including the initial outbreak, containment efforts, and the subsequent stabilization phase. BJMOD contains the workday OD demand information for 276 stations in the Beijing metro, whose period ranges from February 29, 2016 to April 1, 2016. Fig. 11 illustrates the evolving trends of two OD flows from NNMOD and BJMOD respectively. It is evident that the NNMOD’s OD flow steadily increased as the pandemic stabilized, while the BJMOD’s OD flow maintained regular commuting features. To explore the consistency of the data distribution, the Cramér-von Mises (CvM) test was employed to examine the two datasets. The authors partitioned the NNMOD/BJMOD into training, validation, and testing sets following a 7:2:1 ratio and conducted CvM tests between them. All P-values are greater than 0.05, indicating that the data distribution of the training, validation, and testing sets of NNMOD/BJMOD remained consistent at a significance level of 0.05.
COVID-19 relevant data. The daily COVID-19 relevant dataset was obtained from case data and COVID-19-related social media data. Daily confirmed case data were obtained from the official website of China’s National Health Commission, which directly reflected the evolutionary trend of COVID-19. The rise of social media has provided a feasible way to understand the relationship between daily activities and metro OD demands [15], [63]. Therefore, the authors further collected the COVID-19-related social media data with key words “COVID-19,” ”Nanning,” and “metro” though Sina Streaming application programming interface (API) with the geo-location filter during the same period. As pandemic-related data is typically reported on a daily basis, daily pandemic-related data were shared as the heterogeneous feature for each period of the day. However, owing to real-time availability issues, it was not possible to immediately obtain complete data for the current day. Prior research has proven that the external information of the previous day impacts the next-day traffic flow [64]. Therefore, this study fetches complete daily pandemic-related data from the previous day in real time as heterogeneous information for the current period. To determine whether there was a correlation between COVID-19 relevant data and OD demand data, we calculated the Pearson coefficient between them. The absolute values of the Pearson coefficients between them are greater than 0.6, indicating a significant correlation.
Data attributes. The date attribute data include the day of the week, date type (weekday/weekend), and holiday. Therefore, one-hot coding was used to transform the non-numeric external features into binary vectors, and 11 variables in total were chosen in this model to represent all external factors, that is, the number of external features is num_feat = 11. Table 2 describes the details of all data attributes.
5.1.2. Model configurations
In this study, all models are implemented with PyTorch on a desktop computer with Intel® Core™ i9-10900X CPU, 64 GB memory, and an NVIDIA GeForce RTX3060 GPU. Our proposed model consists of two encoder layers and two decoder layers. For better training, dropout layers with a probability of were embedded in the encoder and decoder. The batch size was set as 32. The optimizer used was Adam with a learning rate of 0.0001. Moreover, significance tests (t-test with a P-value 0.01) were performed for all the experimental results.
To examine the impact of several crucial hyperparameters on PAG-STAN, other four hyperparameters of PAG-STAN are investigated in detail, namely , the number of heads , number of time steps , and passenger flow proportion PFP, respectively. Specifically, we set the section for to (64, 128, 256, 512), section for to (3, 4, 6, 8), section for to (8, 10, 12, 14), and section for PFP to (50%, 60%, 70%, 80%). The influences of the above four hyperparameters on PAG-STAN are shown in Fig. 12. Overall, it is evident that both root-mean-squared error (RMSE) and mean-absolute-percentage error (MAPE) gradually decrease as the hyperparameter increases. Nevertheless, beyond a value of 256, further increments did not yield substantial improvements. Considering computational costs, we set to 256 as a balanced choice. The hyperparameters and yielded similar prediction results. Consequently, we set to four and to 12. In terms of the passenger flow proportion PFP, the prediction performance of the model remains consistent when PFP is less than 70%. However, when PFP exceeded 70%, there was a noticeable deterioration in prediction performance. This decline is attributed to a large PFP, which compromises the effectiveness of the matrix compression. The inability to effectively reduce sparsity with a large PFP causes the degradation of the model’s predictive capabilities. Although a lower PFP effectively reduces sparsity and enhances prediction performance, it is limited to the model’s inability to predict the demand for several OD pairs. After careful consideration, we decided to set PFP to 70%, striking a balance between ensuring the prediction performance and predicting an adequate number of OD pair demands.
During model training, the authors used the Model Checkpoint and Early Stopping technique to save the best model and avoid overfitting. Before Early Stopping, both the training and validation losses are as shown in Fig. 13. The rapid and stable convergence of the training loss was evident, whereas the validation loss gradually converged after 400 epochs, reaching a stable state around the 600th epoch and approaching the training loss. This phenomenon can be attributed to the composition of the NNMOD dataset, which comprises complex OD demand data for Nanning metro during the pandemic. These OD demand data exhibit irregular fluctuations, necessitating multiple training iterations to effectively capture complex patterns. We further explored the runtime efficiency of the model in predicting metro OD demand. Specifically, the training duration for PAG-STAN averages 11.83 seconds per epoch, which is deemed acceptable for practical applications. Notably, without the implementation of the dynamic OD demand compression operation, the execution time increases to 18.21 seconds per epoch. This observation underscores the effectiveness of the dynamic OD demand compression operation in mitigating the high-dimensionality problem and reducing computational costs. Therefore, our model demonstrates a real-time performance with runtime efficiency that does not pose a bottleneck to the task. Consequently, more emphasis should be placed on enhancing the accuracy of metro OD demand predictions during a pandemic.
5.1.3. Evaluation metrics
In this study, the root mean square error (RMSE), mean absolute error (MAE), and weighted mean-absolute-percentage error (WMAPE) are chosen as evaluation metrics.where denotes the actual value of OD demand originating from station to station and denotes the corresponding predicted value.
5.2. Comparison with state-of-the-art methods
To comprehensively evaluate the prediction performance of PAG-STAN, the following benchmarks are considered to compare the prediction performance with that of PAG-STAN. The authors implement them based on their official codes. Notably, all baselines leverage the estimated real-time OD demand data to forecast the metro OD demand in the following experiments.
•ARIMA: A representative conventional mathematical statistics-based model. The authors used Expert Modeler in the Statistical-Package-for-the-Social-Sciences (SPSS®) software (International Business Machines Corporation, USA) to obtain the best ARIMA results automatically.
• Three-dimensional (3D) convolutional neural network (3DCNN): 3DCNN extracts rich informative features by convolution operation. Two 3DCNN layers with filters and one fully connected layer with 256 neurons are employed to predict the OD demand.
• LSTM: LSTM can effectively capture temporal features. Two fully-connected LSTM layers are employed to predict the future OD demand, and the dimension of hidden states is set to 256.
• ConvLSTM: ConvLSTM [58] replaces the fully connected operation with the convolution operation of LSTM to model the spatial and temporal features simultaneously. The model configurations of ConvLSTM are same with LSTM.
• Graph WaveNet (GWN): GWN [40] introduce a self-adaptive adjacency matrix to model the hidden spatial dependencies, and develop a stacked dilated one-dimensional (1D) convolution unit to model the long-term temporal dependencies.
• CSTN: CSTN [11] is the first to study the traffic OD demand prediction based on deep learning, integrating the local spatial context, temporal evolution context, and global correlation context to predict interregional traffic demand.
• ST-ResNet: In this model [4], the residual convolution units are applied to model the spatial-temporal features. The authors apply three branches of residual convolution units to learn the multiple spatiotemporal dependencies.
• ST-ED-RMGC: ST-ED-RMGC [51] is proposed for metro OD demand prediction. In this model, multiple graphs are introduced to learn the complex patterns of OD demand.
• Transformer: Transformer [62] is an attention-based model that utilizes multi-head attention mechanism to learn the attention weight of sequence data. The number of attention head is eight , and the embedding dimension is set to 512.
• Informer: Informer [65] is an efficient transformer-based model for long sequence time-series forecasting. The hyperparameters of Informer are same with that of Transformer’s.
Heterogeneous Information Aggregation Machine (HIAM): HIAM [9] fully exploits heterogeneous information of historical data to jointly learn the evolutionary patterns of OD and destination-origin (DO) ridership so that predicts the future cross-station ridership.
(1) Performance of NNMOD. The prediction performance of PAG-STAN was first compared with those of the other methods on the NNMOD testing set. The performances of all methods are summarized in Table 3. It can be observed that the mathematical statistical model ARIMA performed worst among all the compared models at all time intervals because ARIMA has a poor capacity to capture the complex spatial-temporal features of OD demand. The basic deep learning models CNN and LSTM achieve certain performance improvements by explicitly modeling the spatial or temporal dependencies of OD demand. By modeling the spatiotemporal distribution of the OD demand simultaneously, the composite models (e.g., ConvLSTM, GWN, and ST-ResNet) outperformed the basic deep learning models, indicating the importance of fully exploring complex spatiotemporal dependencies. The prevalent OD demand prediction methods, namely CSTN and ST-ED-RMGC did not achieve satisfactory results for the NNMOD dataset. This poor prediction performance may be owing to these methods focusing on taxi OD demand prediction, whose spatiotemporal dependencies differ significantly from those of the metro OD demand. Therefore, these two methods are unsuitable for predicting the metro OD demand. The attention-based models Transformer and Informer perform competitively under all time intervals, and they effectively model temporal dependencies regardless of the distance in the sequence. These two models achieved competitive prediction performances despite not considering the spatial characteristics, as the spatial topology structure of Nanning metro is not complicated. HIAM achieved the second-best prediction performance in all time intervals, which is because HIAM fully captures various information (i.e., incomplete OD ridership, unfinished orders, and DO ridership) to model the metro OD distribution. However, the absence of consideration of the quantity relationship between OD demand and inbound flow data slightly compromises the interpretability of the HIAM’s predictions.
Data in the columns are mean ± standard deviation; the best and second-best prediction performances are highlighted in bold and underlined, respectively; numbers marked with * indicate that the improvement is statistically significant compared with the best baseline (t-test with P-value < 0.01).
Despite progress in model prediction performance, all the above models only focus on historical OD demand information while ignoring the external factors that impact the OD demand distribution during COVID-19. Our PAG-STAN deeply integrates multi-periodic OD demand patterns to capture the periodic spatial–temporal distribution of OD demand, as well as introduces heterogeneous data sources (i.e., COVID-19 relevant data, date-attribute data) to learn meaningful information that can reflect the impact of COVID-19 on OD demand to enhance the evolution features of OD demand over COVID-19. Therefore, PAG-STAN outperforms the other completed methods at all time intervals, improving the existing state-of-the-art methods by %, %, and % in terms of the average RMSE, MAE, and WMAPE, respectively.
(2) Performance of BJMOD. To comprehensively assess the robustness and sensitivity of PAG-STAN, the authors conducted a metro OD demand prediction task on another real-world dataset, BJMOD. Note that, in this context, the date attribute data are considered solely as auxiliary information to investigate the impact of external factors on OD demand. The prediction performances of all methods are summarized in Table 3. Considering the increased spatial complexity of the Beijing metro compared with the Nanning metro, models that neglect the spatial dependence of the OD demand yield poor results in the BJMOD dataset. Notably, the widely used OD demand prediction methods CSTN and ST-ED-RMGC exhibited suboptimal performance on the BJMOD dataset, underscoring their poor robustness in metro OD demand prediction. Conversely, the HIAM model remained competitive with BJMOD, demonstrating its notable robustness. PAG-STAN emerged as the top-performing model among all baselines. This is attributed to PAG-STAN’s utilization of multiple heterogeneous data to discern the evolutionary trends of future OD ridership. Additionally, it incorporates the quantity relationship between the OD demand and inbound flow into the loss function, enhancing the interpretability of the prediction model. These results convincingly illustrate the robustness and sensitivity of PAG-STAN and provide valuable insights into its practical application under conventional scenarios.
(3) Performance of the high-demand stations. The metro OD distribution is nonuniform, with most OD demands spatially distributed between residential and commercial areas. It is critical for operators to predict the metro OD demands between high-demand stations accurately to make timetable plans in advance. Therefore, this section evaluates the OD demand-prediction performance of PAG-STAN for high-origin demand stations. The authors selected stations with high origin demand to conduct the experiments, which covered approximately % of the OD demand in Nanning metro. The prediction performances of different methods for high-demand stations are presented in Table 4. Specifically, the PAG-STAN outperformed other baselines, with a % reduction in RMSE, % reduction in MAE, and % reduction in WMAPE compared to the existing best model. Meanwhile, the superiority of PAG-STAN over the other models became more apparent with larger time intervals. These prediction results indicate that PAG-STAN achieves favorable performance in OD demand prediction between high-demand stations.
(4) Visualization of OD demand prediction results. To provide a more intuitive understanding of the prediction performance of PAG-STAN, the predicted OD matrices for different periods (8:00-8:30, 12:00-12:30, and 19:00-19:30) were visualized using a heatmap. The columns represent the origin stations, and the rows represent the destination stations, as shown in Fig. 14. Owing to the impact of COVID-19, even during the morning or evening peak hours, the OD demand value of most OD pairs is small or even zero, and only a few OD pairs have high OD demand. Such a significant sparsity of the OD demand matrix makes it challenging to accurately predict the OD demand during COVID-19. To achieve an accurate metro OD demand prediction, PAG-STAN integrates multiple historical time-series OD demand data to explore the periodic OD distribution. Therefore, the predicted values were close to the ground truth for both low- and high-demand OD pairs. The prediction performance of PAG-STAN in the morning or evening peak hours was better than that at noon because there were more significant distribution features of the OD demand during the morning or evening peak hours. Moreover, the authors further visualize the time-series demand data of one OD pair selected from each of the NNMOD and BJMOD datasets, specifically the OD pair Guangxi University Station to Chaoyang Square Station (GXU-CYS) and the OD pair Xizhimen Station to Chongwenmen Station (XZM-CWM). This is shown in Fig. 15 that PAG-STAN achieved the favorable prediction performance no matter in NNMOD or BJMOD datasets. While the demand time-series data for GXU-CYS experienced irregular fluctuations owing to the impact of the epidemic, the model effectively learned the spatiotemporal distribution information of OD demand from various historical heterogeneous data (COVID-19-related data and date attribute data). Consequently, the predicted outcomes broadly capture evolutionary trends in the demand data. The demand time-series data for the OD pair XZM-CWM reflect changes in OD demand during working days, exhibiting distinct morning peak characteristics and regular fluctuations. By leveraging historical weekly and daily demand time-series data, PAG-STAN captures the periodic information of demand data, resulting in a good fit between the predicted outcomes and actual demand data. These results demonstrate the superiority of PAG-STAN for metro OD demand prediction in both the pandemic and conventional scenarios.
5.3. Ablation studies
In this subsection, extensive ablation studies are conducted to verify the effectiveness of each component of PAG-STAN.
5.3.1. Effectiveness of multiple historical time-series OD demand
PAG-STAN considers multiple historical time-series OD demands to study the long short-term periodicity of the OD distribution. In this section, the influence of long-term and short-term OD distribution information from weekly and daily time-series OD demands is explored. Several variants are introduced as follows:
•Net-real-time OD: This variant directly uses the estimated real-time OD demand to predict the OD demand.
•Net-real-time OD + weekly OD: This variant utilizes the weekly time-series OD demand to explore the effect of long-term historical OD information on OD demand prediction.
•Net-real-time OD + daily OD: This variant uses the daily time-series OD demand to study the impact of short-term historical OD information on metro OD demand prediction.
•Net-multi-periodic OD: This network incorporates both the weekly and daily time-series OD demand to capture long short-term historical OD distribution information to predict the metro OD demand.
As summarized in Table 5, the net-real-time OD obtains poor prediction performance at all time intervals because this variant only considers limited real-time information for OD demand prediction. When considering weekly or daily time-series OD patterns, the prediction performance of PAG-STAN improved slightly, demonstrating that long- or short-term periodic OD distribution information is beneficial for OD demand prediction. Moreover, the variant net-real-time + daily OD outperformed net-real-time + weekly OD in most cases, indicating that short-term periodic spatial–temporal information characterizes the periodic OD evolution pattern more accurately. Compared with these variants, net-multi-periodic OD achieved the best prediction performance by exploring the long short-term periodic OD distribution information from multiple time-series OD demands, improving net-real-time OD by %, %, and % in terms of average RMSE, MAE, and WMAPE, respectively. Therefore, long short-term periodic OD distribution information from multiple historical time-series OD demands can significantly improve OD demand prediction.
5.3.2. Effectiveness of real-time OD demand estimation
In this study, the real-time OD demand is estimated according to the real-time inflow and historical long short-term OD distribution rate. Several variants have been proposed to study the influence of real-time OD demand estimation on metro OD demand prediction.
•Net-incomplete OD: This variant directly utilizes the incomplete real-time OD demand to predict the future metro OD demand.
•Net-estimated OD (long-term): This variant utilizes long-term complete OD distribution rate to estimate the real-time OD demand for metro OD demand prediction.
•Net-estimated OD (short-term): This variant leverages short-term complete OD distribution rate to estimate the real-time OD demand for metro OD demand prediction.
•Net-estimated OD (long- and short-term): This network employs the estimated real-time OD demand that considering both the long-term and short-term complete OD distribution information to predict the future metro OD demand.
The prediction performances of all variants are summarized in Table 6. It can be observed that when directly utilizing the incomplete real-time OD demand matrix, the variant net-incomplete OD achieved the worst prediction performance. This is because the incomplete matrix does not record the unfinished travel information, which interferes with the OD distribution information learning. Meanwhile, it can be observed that the prediction performance of the net-estimated OD (long-term) and net-estimated OD (short-term) variants was further improved because they all considered complete OD demand information. Similar to the previous experimental results, net-estimated OD (short-term) outperformed net-estimated OD (long-term) in most time intervals, proving that short-term periodic OD distribution data can provide useful information for OD demand prediction. Compared with these variants, the net-estimated OD (long- and short-term) achieved the ideal prediction performance at all time intervals, indicating that the real-time OD estimation block effectively estimated the real-time complete OD demand with long short-term periodic OD distribution information.
5.3.3. Effectiveness of external factors
This work fuses the multiple heterogeneous data (e.g., pandemic-related data, date attribute data) to study the impact of the external factors on metro OD distribution under pandemic. In this subsection, we investigate the influences of multiple external factors on the predictive performance of the model. Several variants are introduced as follows:
•Net-no external factors: This variant does not utilize any multiple heterogeneous data to consider the impact of external factors to metro OD distribution under pandemic.
•Net-pandemic information: This variant only utilizes the pandemic-related data to study the impact of pandemic on metro OD distribution under pandemic.
•Net-date attribute information: This variant only uses the date attribute information to study the impact of date attribute on metro OD distribution under pandemic.
•Net-heterogeneous information: This method employs the pandemic-related information and date attribute information to fully study the influence of external factors on metro OD distribution under pandemic.
As depicted in Table 7, PAG-STAN attains the most favorable prediction performance across all time intervals when incorporating all external factors. Pandemic-related data enabled the model to comprehensively understand the evolutionary patterns of OD distribution over time from a global perspective, and date attribute data assisted the model in capturing the periodicity (weekdays/weekends) of OD distribution from a local perspective. Integrating these two forms of heterogeneous information into PAG-STAN is effective for examining the influence of external factors on the metro OD distribution, facilitating the learning of evolving features during a pandemic from both global and local perspectives. Notably, when fusing only pandemic-related data with OD demand data, our model outperformed the scenarios in which only date attribute data were fused with OD demand data across all time intervals. This outcome suggests that capturing the global evolution features of the OD distribution during a pandemic is more beneficial than focusing solely on the local periodicity of the OD distribution. Consequently, comprehending the impact of external factors on metro OD distribution is pivotal in forecasting the metro OD demand during pandemic.
5.3.4. Effectiveness of dynamic OD demand matrix compression
This study transformed the raw OD demand matrix with dimension to the dense OD demand matrix with dimension, addressing the data sparsity and high-dimensionality issues. To study whether the OD demand matrix compression operation can improve OD demand prediction, we utilized the raw complete OD demand matrix with dimensions to predict metro OD demand. After obtaining the predicted OD demand matrix , the shape of the predicted OD demand matrix is transformed into that of the dense OD demand matrix to compare prediction performance.
As shown in Table 8, the WMAPE for the remaining stations exceeded % when the raw OD demand matrix was applied. This high error is attributed to small or even zero values in the raw OD demand matrix; thus, causing a sparse matrix that hampers feature learning. However, PAG-STAN demonstrated an improved prediction performance, achieving an average WMAPE of % across all time intervals for the remaining stations when using a dense OD demand matrix. This is attributed to the compression operation, which enables the model to capture significant distribution features in the OD demand. The compression operation improves the prediction performance at the top stations. The average WMAPE of all the time intervals for the top stations in the compressed OD demand matrix are %, whereas that of the raw OD matrix was %. This is because the sparsity of the raw OD matrix negatively affected the model when studying significant OD distribution information. Therefore, the compression operation effectively overcomes the data sparsity issue, leading to enhanced predictability of the metro OD demand.
5.3.5. Effectiveness of MPG-loss function
This study proposes a new MPG-loss function to embed physical ridership quantity laws into a loss function to guide model training. Furthermore, it adopts a masking operation to eliminate training interference caused by filling 0 in the dense OD matrix. This subsection explores the influence of the MPG-loss function on metro OD demand prediction.
•Net-general loss function: Training PAG-STAN according to the general mean square error (MSE) loss function.
•Net-information embedding loss function: This variant only embeds the ridership quantity laws into the loss function to enhance model training.
•Net-masked loss function: This variant only masks the filled zero values caused by compression operation to avoid back propagate of the errors during model training.
•Net-MPG-loss function: Training PAG-STAN through MPG-loss function.
5.3.6. Joint effectiveness of multiple modules
All above ablation studies were conducted to study the effectiveness of a single module; however, different modules may jointly affect the prediction performance. Therefore, this subsection discusses the joint influence of multiple modules (multi-periodic OD pattern, real-time OD estimation module, and heterogeneous information fusion module) on our model, helping to further understand the contribution of each module.
•Net-only real-time estimated OD: This variant only utilizes the real-time estimated OD demand to predict future OD demand.
•Net-multi-periodic historical OD and incomplete real-time OD: This variant uses the multi-periodic OD pattern (weekly/daily historical OD pattern) and incomplete real-time OD demand data to predict future OD demand.
•Net-multi-periodic historical OD and real-time estimated OD: This variant uses the multi-periodic OD pattern (weekly/daily historical OD pattern) and real-time estimated OD demand to predict future OD demand.
•Net-real-time estimated OD and heterogeneous information: This variant leverages real-time estimated OD demand and heterogeneous information to predict future OD demand.
•Net-incomplete real-time OD and heterogeneous information: This variant leverages incomplete real-time OD demand and heterogeneous information to predict future OD demand.
Table 10 presents the evaluation metrics for different module combinations. In particular, the performance of the model is significantly affected by the completeness of real-time OD demand. Even with the incorporation of weekly or daily time-series OD demand information or heterogeneous information, incomplete OD demand inputs result in poor prediction performance. When incorporating weekly and daily time-series OD demand information, the relevant variants outperformed those considering heterogeneous information, emphasizing the superior utility of historical long-term periodic information in enhancing prediction performance. Moreover, net-multi-periodic historical OD and incomplete real-time OD attained the second-best prediction performance, which was attributed to the rich information provided by the estimated real-time OD demand data and historical weekly/daily time-series OD demand data. These data sources offer comprehensive insight into the spatiotemporal distribution of OD demands from multiple perspectives. This highlights the joint effectiveness of the estimated real-time OD demand data and historical weekly/daily time-series OD demand data for enhancing the prediction performance. Therefore, the joint effectiveness of the estimated real-time OD demand data and historical weekly and daily time-series OD demand data significantly enhances metro OD prediction, with heterogeneous data offering auxiliary evolutionary information to some extent.
6. Conclusions
This study focuses on a challenging task, that is, short-term OD demand prediction of URT systems under pandemic, and its contributions are as follows:
•The authors fully summarized the existing issues in metro OD demand prediction, including the OD demand data real-time availability issue, sparsity issue, high-dimensionality issue, and the impact of external factors under pandemic.
•Furthermore, the authors proposed a novel PAG-STAN framework to address the above issues. The framework initially approximates a complete real-time dense OD demand matrix. Subsequently, it captures multi-periodic spatial-temporal dependencies of OD metro demand through an encoder-decoder framework. Finally, it enhances model training and interpretability by embedding the ridership quantity relationship in the loss function.
•Experiments on a large-scale COVID-19 metro OD demand dataset demonstrate that PAG-STAN outperforms other state-of-the-art methods on metro OD demand prediction under pandemic. Meanwhile, the generality of the model is verified on another metro dataset from Beijing metro under non-pandemic.
However, our study faces several limitations.
•While the dynamic OD demand matrix compression module effectively handles sparsity and high-dimensionality issues, it results in the loss of relative position information between OD pairs. Future work will explore more optimal method to overcome sparsity and high-dimensionality while retaining complete relative position information.
•The OD demand prediction of URT systems under other emergencies, such as larger-scale sports events, are also of significant research value. Future work will apply PAG-STAN to predict the OD demand under other emergencies, enhancing the universality of PAG-STAN.
Acknowledgments
This research was supported by the National Natural Science Foundation of China (72288101, 72201029, and 72322022).
Compliance with ethics guidelines
Shuxin Zhang, Jinlei Zhang, Lixing Yang, Feng Chen, Shukai Li, and Ziyou Gao declare that they have no conflicts of interest or financial conflicts to disclose.
ZhangJ, ZhengY, QiD. Deep spatio-temporal residual networks for citywide crowd flows prediction. In:Proceedings of the AAAI-17: 31st AAAI conference on artificial intelligence; 2017 Feb 4-9; San Francisco, CA, USA. Washington, DC: Association for the Advancement of Artificial Intelligence (AAAI); 2017.
[5]
L.Zhao, Y.Song, C.Zhang, Y.Liu, P.Wang, T.Lin, et al. T-GCN: a temporal graph convolutional network for traffic prediction. IEEE Trans Intell Transp Syst, 21 (9) (2020), pp. 3848-3858.
[6]
W.Jiang, Z.Ma, H.N.Koutsopoulos. Deep learning for short-term origin-destination passenger flow prediction under partial observability in urban railway systems. Neural Comput Appl, 34 (2022), pp. 4813-4830.
[7]
P.Noursalehi, H.N.Koutsopoulos, J.H.Zhao. Dynamic origin-destination prediction in urban rail systems: a multi-resolution spatio-temporal deep learning approach. IEEE Trans Intell Transp Syst, 23 (6) (2022), pp. 5106-5115.
[8]
J.Zhang, H.Che, F.Chen, W.Ma, Z.He. Short-term origin-destination demand prediction in urban rail transit systems: a channel-wise attentive split-convolutional neural network method. Transp Res Part C Emerg Technol, 124 (2021), Article 102928.
[9]
LiuL, ZhuY, LiG, WuZ, BaiL, LinL. Online metro origin-destination prediction via heterogeneous information aggregation. 2022. arXiv:2107.00946v5.
[10]
G.Zhu, J.Ding, Y.Wei, Y.Yi, S.S.D.Xu, E.Q.Wu. Two-stage OD flow prediction for emergency in urban rail transit. IEEE Trans Intell Transp Syst, 25 (1) (2023), pp. 920-928.
[11]
L.Liu, Z.Qiu, G.Li, Q.Wang, W.Ouyang, L.Lin. Contextualized spatial-temporal network for taxi origin-destination demand prediction. IEEE Trans Intell Transp Syst, 20 (10) (2019), pp. 3875-3887.
[12]
X.X.Zou, S.Y.Zhang, C.H.Zhang, J.J.Q.Yu, E. Chung. Long-term origin-destination demand prediction with graph deep learning. IEEE Trans Big Data, 8 (6) (2021), pp. 1481-1495.
[13]
M. Van derVoort, M.Dougherty, S.Watson. Combining kohonen maps with arima time series models to forecast traffic flow. Transp Res Part C Emerg Technol, 4 (5) (1996), pp. 307-318.
[14]
M.C.Tan, S.C.Wong, J.M.Xu, Z.R.Guan, P.Zhang. An aggregation approach to short-term traffic flow prediction. IEEE Trans Intell Transp Syst, 10 (1) (2009), pp. 60-69.
[15]
M.Ni, Q.He, J.Gao. Forecasting the subway passenger flow under event occurrences with social media. IEEE Trans Intell Transp Syst, 18 (6) (2016), pp. 1623-1632.
[16]
M.Castro-Neto, Y.S.Jeong, M.K.Jeong, L.D.Han. Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions. Expert Syst Appl, 36 (3) (2009), pp. 6164-6173.
[17]
P.Högberg. Estimation of parameters in models for traffic prediction: a non-linear regression approach. Transp Res, 10 (4) (1976), pp. 263-265.
[18]
H.Sun, H.X.Liu, H.Xiao, R.R.He, B.Ran. Use of local linear regression model for short-term traffic forecasting. Transp Res Rec, 1836 (1) (2003), pp. 143-150.
[19]
Z.Zheng, D.Su. Short-term traffic volume forecasting: a k-nearest neighbor approach enhanced by constrained linearly sewing principle component algorithm. Transp Res Part C Emerg Technol, 43 (2014), pp. 143-157.
[20]
M.W.Li, W.C.Hong, H.G.Kang. Urban traffic flow forecasting using Gauss-SVR with cat mapping, cloud model and PSO hybrid algorithm. Neurocomputing, 99 (2013), pp. 230-240.
[21]
W.C.Hong. Traffic flow forecasting by seasonal SVR with chaotic simulated annealing algorithm. Neurocomputing, 74 (12-13) (2011), pp. 2096-2107.
[22]
P.Cai, Y.Wang, G.Lu, P.Chen, C.Ding, J.Sun. A spatiotemporal correlative k-nearest neighbor model for short-term traffic multistep forecasting. Transp Res Part C Emerg Technol, 62 (2016), pp. 21-34.
[23]
G.Lin, A.Lin, D.Gu. Using support vector regression and k-nearest neighbors for short-term traffic flow prediction based on maximal information coefficient. Inf Sci, 608 (2022), pp. 517-531.
[24]
A.M.Avila, I.Mezić. Data-driven analysis and forecasting of highway traffic dynamics. Nat Commun, 11 (2020), p. 2090.
[25]
Y.Lv, Y.Duan, W.Kang, Z.Li, F.Y.Wang. Traffic flow prediction with big data: a deep learning approach. IEEE Trans Intell Transp Syst, 16 (2) (2014), pp. 865-873.
[26]
Y.Liu, Z.Liu, R.Jia. DeepPF: a deep learning based architecture for metro passenger flow prediction. Transp Res Part C Emerg Technol, 101 (2019), pp. 18-34.
[27]
L.Liu, R.C.Chen. A novel passenger flow prediction model using deep learning methods. Transp Res Part C Emerg Technol, 84 (2017), pp. 74-91.
[28]
N.G.Polson, V.O.Sokolov. Deep learning for short-term traffic flow prediction. Transp Res Part C Emerg Technol, 79 (2017), pp. 1-17.
[29]
J.Guo, Z.Xie, Y.Qin, L.Jia, Y.Wang. Short-term abnormal passenger flow prediction based on the fusion of SVR and LSTM. IEEE Access, 7 (2019), pp. 42946-42955.
[30]
Y.Jing, H.Hu, S.Guo, X.Wang, F.Chen. Short-term prediction of urban rail transit passenger flow in external passenger transport hub based on LSTM-LGB-DRS. IEEE Trans Intell Transp Syst, 22 (7) (2021), pp. 4611-4621.
[31]
J.An, L.Fu, M.Hu, W.Chen, J.Zhan. A novel fuzzy-based convolutional neural network method to traffic flow prediction with uncertain traffic accident information. IEEE Access, 7 (2019), pp. 20708-20722.
[32]
Y.Liu, C.Lyu, X.Liu, Z.Liu. Automatic feature engineering for bus passenger flow prediction based on modular convolutional neural network. IEEE Trans Intell Transp Syst, 22 (4) (2021), pp. 2349-2358.
[33]
B.Du, H.Peng, S.Wang, M.Z.A.Bhuiyan, L.Wang, Q.Gong, et al. Deep irregular convolutional residual LSTM for urban traffic passenger flows prediction. IEEE Trans Intell Transp Syst, 21 (3) (2020), pp. 972-985.
[34]
C.Chen, Y.Liu, L.Chen, C.Zhang. Bidirectional spatial-temporal adaptive transformer for urban traffic flow forecasting. IEEE Trans Neural Netw Learn Syst, 34 (10) (2022), pp. 6913-6925.
[35]
YuB, YinH, ZhuZ. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. 2017. arXiv:1709.04875.
M.Lv, Z.Hong, L.Chen, T.Chen, T.Zhu, S.Ji. Temporal multi-graph convolutional network for traffic flow prediction. IEEE Trans Intell Transp Syst, 22 (6) (2021), pp. 3337-3348.
[38]
B.Yu, Y.Lee, K.Sohn. Forecasting road traffic speeds by considering area-wide spatio-temporal dependencies based on a graph convolutional neural network (GCN). Transp Res Part C Emerg Technol, 114 (2020), pp. 189-204.
[39]
H.Peng, B.Du, M.Liu, M.Liu, S.Ji, S.Wang, et al. Dynamic graph convolutional network for long-term traffic flow prediction with reinforcement learning. Inf Sci, 578 (2021), pp. 401-416.
[40]
WuZ, PanS, LongG, JiangJ, ZhangC. Graph WaveNet for deep spatial-temporal graph modeling. 2019. arXiv:1906.00121.
[41]
J.Wang, Y.Zhang, Y.Wei, Y.Hu, X.Piao, B. Yin. Metro passenger flow prediction via dynamic hypergraph convolution networks. IEEE Trans Intell Transp Syst, 22 (12) (2021), pp. 7891-7903.
[42]
S.Reza, M.C.Ferreira, J.J.M.Machado, J.M.R.S.Tavares, J.J.M.Machado, J.M.R.S. Tavares. A multi-head attention-based transformer model for traffic flow forecasting with a comparative analysis to recurrent neural networks. Expert Syst Appl, 202 (2022), Article 117275.
[43]
X.Ye, S.Fang, F.Sun, C.Zhang, S.Xiang. Meta graph transformer: a novel framework for spatial-temporal traffic prediction. Neurocomputing, 491 (2022), pp. 544-563.
[44]
H.Zhang, Y.Zou, X.Yang, H.Yang. A temporal fusion transformer for short-term freeway traffic speed multistep prediction. Neurocomputing, 500 (2022), pp. 329-340.
[45]
H.Yan, X.Ma, Z.Pu. Learning dynamic and hierarchical traffic spatiotemporal features with transformer. IEEE Trans Intell Transp Syst, 23 (11) (2022), pp. 22386-22399.
[46]
Y.Xie, J.Niu, Y.Zhang, F.Ren. Multisize patched spatial-temporal transformer network for short- and long-term crowd flow prediction. IEEE Trans Intell Transp Syst, 23 (11) (2022), pp. 21548-21568.
[47]
XuM, DaiW, LiuC, GaoX, LinW, QiGJ, et al. Spatial-temporal transformer networks for traffic flow forecasting. 2020. arXiv:2001.02908.
[48]
K.F.Chu, A.Y.S.Lam, V.O.K.Li. Deep multi-scale convolutional LSTM network for travel demand and origin-destination predictions. IEEE Trans Intell Transp Syst, 21 (8) (2020), pp. 3219-3232.
[49]
HuJ, YangB, GuoC, JensenCS, XiongH. Stochastic origin-destination matrix forecasting using dual-stage graph convolutional, recurrent neural networks. In: Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE-2020); 2020 Apr 20-24; Dallas, TX, USA. New York City: IEEE; 2020. p. 1417-28.
[50]
X.Yao, Y.Gao, D.Zhu, E.Manley, J.Wang, Y.Liu. Spatial origin-destination flow imputation using graph convolutional networks. IEEE Trans Intell Transp Syst, 22 (12) (2021), pp. 7474-7484.
[51]
J.Ke, X.Qin, H.Yang, Z.Zheng, Z.Zhu, J. Ye. Predicting origin-destination ride-sourcing demand with a spatio-temporal encoder-decoder residual multi-graph convolutional network. Transp Res Part C Emerg Technol, 122 (2021), Article 102858.
[52]
Z.Huang, D.Wang, Y.Yin, X.Li. A spatiotemporal bidirectional attention-based ride-hailing demand prediction model: a case study in Beijing during COVID-19. IEEE Trans Intell Transp Syst, 23 (12) (2022), pp. 25115-25126.
[53]
Z.Huang, W.Zhang, D.Wang, Y.Yin. A GAN framework-based dynamic multi-graph convolutional network for origin-destination-based ride-hailing demand prediction. Inf Sci, 601 (2022), pp. 129-146.
[54]
M.Qurashi, Q.L.Lu, G.Cantelmo, C.Antoniou. Dynamic demand estimation on large scale networks using principal component analysis: the case of non-existent or irrelevant historical estimates. Transp Res Part C Emerg Technol, 136 (2022), Article 103504.
[55]
M.Qurashi, T.Ma, E.Chaniotakis, C.Antoniou. PC-SPSA: employing dimensionality reduction to limit SPSA search noise in DTA model calibration. IEEE Trans Intell Transp Syst, 21 (4) (2020), pp. 1635-1645.
[56]
J.Zhang, F.Chen, Y.Guo, X.Li. Multi-graph convolutional network for short-term passenger flow forecasting in urban rail transit. IET Intell Transp Syst, 14 (10) (2020), pp. 1210-1217.
[57]
KipfTN, WellingM. Semi-supervised classification with graph convolutional networks. 2016. arXiv:1609.02907.
[58]
ShiX, ChenZ, WangH, YeungDY, WongWK, WooWC, Convolutional LSTM network:a machine learning approach for precipitation nowcasting. In: Proceedings of the 28th International Conference on Neural Information Processing Systems; 2015 Dec 7-12; Montreal, QC, Canada. New York City: Association for Computing Machinery (ACM); 2015. p. 802-10.
[59]
H.Zheng, F.Lin, X.Feng, Y.Chen. A hybrid deep learning model with attention-based Conv-LSTM networks for short-term traffic flow prediction. IEEE Trans Intell Transp Syst, 22 (11) (2021), pp. 6910-6920.
[60]
X.Lu, C.Ma, Y.Qiao. Short-term demand forecasting for online car-hailing using Conv-LSTM networks. Physica A, 570 (2021), Article 125838.
[61]
Y.Li, S.Chai, G.Wang, X.Zhang, J.Qiu. Quantifying the uncertainty in long-term traffic prediction based on PI-ConvLSTM network. IEEE Trans Intell Transp Syst, 23 (11) (2022), pp. 20429-20441.
[62]
VaswaniA, ShazeerN, ParmarN, UszkoreitJ, JonesL, AidanN, et al. Attentionis all you need. Proceedingsof the 31st Conference on Neural Information Processing Systems NIPS 2017; 2017Dec 4-9; Long BeachCA, USA. Online: NeurIPS Proceedings; 2017.
[63]
Y.Chen, Y.Lv, X.Wang, L.Li, F.Y.Wang. Detecting traffic information from social media texts with deep learning approaches. IEEE Trans Intell Transp Syst, 20 (8) (2019), pp. 3049-3058.
[64]
W.Yao, S.Qian. From twitter to traffic predictor: next-day morning traffic prediction using social media data. Transp Res Part C Emerg Technol, 124 (2021), Article 102938.
[65]
H.Zhou, S.Zhang, J.Peng, S.Zhang, J.Li, H.Xiong, et al. Informer: beyond efficient transformer for long sequence time-series forecasting. Proc Conf AAAI Artif Intell, 35 (12) (2021), pp. 11106-11115.