《1. Introduction》

1. Introduction

With the rapid development of the world’s civil aviation industry, severe flight delays have remained a major problem and a cause of inconvenience. Such delays not only discourage passengers from considering air transport or choosing the same airline again [1–4], but also force airlines to bear the additional costs of aircraft maintenance and fleet underutilization [5]. Furthermore, flight delays lead to increases in fuel consumption and carbon dioxide emissions, bringing harm to the environment [6,7]. Besides the direct impacts listed above, flight delays have a negative influence on various aspects of the overall economy [8]. In summary, flight delays constitute a severe and widespread problem with significant negative impacts.

Many factors contribute to the complexity and intractability of this problem. These factors are generally summarized as abnormal weather [9,10] and technical reasons including air traffic control [11], insufficient facility capacity, poor scheduling [12], changes in procedures [13], and limited buffer time [14]. This variety of factors makes it difficult to understand the underlying pattern of flight delays and design appropriate strategies [15]. Recently, data-driven approaches based on historical observations have been shown to be free from previous constraints and to fit the potential dynamic properties [16]. Therefore, one approach to facilitating system cognition and decision-making is to fully utilize and learn from historical data [17]. For example, when faced with bad weather, it is possible to look up past days with similar weather conditions and refer to the actions taken by air traffic controllers on those days. Several previous studies [18–23] have focused on identifying fixed patterns in air traffic management. Liu et al. [19] introduced a semi-supervised learning algorithm that can determine groups of similar days as different patterns. The first step was to measure the distance between hourly weather forecasts, followed by identifying the days with a small total distance. The authors applied this method to conduct two case studies at Newark Liberty International Airport and proved its efficiency. Mukherjee et al. [20] proposed a way to classify patterns based on the impact of severe weather conditions. The authors used a weather index as the input and applied factor analysis to identify dominant weather patterns. Then, they clustered days using Ward’s minimum-variance method. Days belonging to the same cluster shared similar weather patterns. Apart from weather patterns, some studies have attempted to identify similar days from other perspectives. Grabbe et al. [21] used a k-means clustering algorithm to identify similar days in terms of ground delay programs. The authors applied an expectation-maximization (EM) algorithm on the start and end times of ground delay programs and scheduled arrival rate data. Other studies have focused on air traffic flow and flight delays to identify similar patterns [24]. Gorripaty et al. [17] measured the principal components in demand and capacity data, but found that there were no natural patterns in demand or capacity after applying clustering analysis to the data. By identifying the periodic patterns of arrival delay for non-stop domestic flights, Abdel-Aty et al. [25] found that some patterns were not detected by statistical methods.

Despite these previous efforts, there are still gaps in understanding flight delay patterns. As mentioned above, an effective method is to find clusters or patterns in spatiotemporal historical data [26,27]. However, due to the high dimensionality of this data, it is difficult to find distinct patterns in Euclidian space, as mentioned in previous studies [28]. Methods of latent component analysis have thus been put forward to reveal the hidden patterns. Instead of directly mining the patterns, methods such as latent distribution analysis [29], latent trait analysis (including item response theory and Rasch models), and grade-of-membership analysis can utilize the signatures derived from the tensor factorization to form a projection subspace with much lower dimensionality, which strengthens the underlying cluster structure of spatiotemporal traffic dynamic patterns [30]. These methods open up new frontiers in the field of transportation science [31], such as urban mobility analysis [32–34], traffic speed prediction [35], missing traffic data completion [36,37], and vessel track recovery [38].

Inspired by the methods described above, the main goal of this paper is to understand potential air traffic and flight delay patterns using massive air traffic data. In the first part of this work, we regard flight records as a multivariate observation sampled from a universal distribution and adopt a Tucker-like latent class model to identify principal patterns of each mode. We then propose an estimation model based on potential patterns to show the efficiency of this framework. The remainder of the paper is organized as follows. In Section 2, we introduce the modeling framework. Section 3 demonstrates a case study in which we apply the model to a dataset of the Chinese aviation system, and Section 4 gives conclusions.

《2. Modeling framework》

2. Modeling framework

In this section, we introduce the overall framework for modeling multi-way flight data in a probabilistic setting. The aim is to characterize the principal patterns of air traffic and flight delays from different dimensions and their joint interactions. In Section 2.1, we introduce the notation used in our framework. In Section 2.2, the nonnegative Tucker decomposition (NTD) method is presented as a method of understanding the tensor decomposition (TD). Finally, the latent class analysis (LCA) is described in Section 2.3.

《2.1. Notation》

2.1. Notation

We let represent a flight record a, where q represents the number of dimensions—that is, the number of attributes in a single trip record. To characterize the features of flights, each element may denote the departure airport of the flight (xa1), departure time (xa2), departure date (xa3), delay level (xa4), and so forth.

We map the values into discrete values for convenience. We define discrete values starting from 1 for attribute β (β = 1, ... , q). β denotes the indices of attributes. wq is the discrete value of the vector of dimension q. Taking the departure airport as an example, xa1 = 1 represents airport 1 for flight record a. The departure time of a flight is then represented by , with each value corresponding to a 1 h period in a day. We then present our method using flight record data from four years (2014–2017), with the days ordered by departure date as . As the arrival delay is more related to airborne delay, which is difficult to analyze, we consider departure delay rather than arrival delay to investigate the flight delay patterns. Departure delay can better reflect the operation mode of the target airport. In addition, there is a clear definition for the level of departure delays. According to the rules set by the Federal Aviation Administration (FAA) and related research, delayed flights are those that depart more than 15 min later than the scheduled time. Considering practical circumstances, a number of 45 and 90 min later than scheduled are also selected as division thresholds. Thus, we classify the delay of each flight into four levels: ① < 15 min, ② 15–45 min, ③ 45–90 min, and ④ > 90 min. We use to represent ‘‘on time,” ‘‘slight delay,” ‘‘moderate delay,” and ‘‘serious delay,” respectively.

《2.2. Nonnegative Tucker decomposition》

2.2. Nonnegative Tucker decomposition

It has been shown that TD displays a number of advantages in various contexts, especially when data must be decomposed into a sum of additive components [39]. TD was first proposed by Tucker [40] in 1963. Since then, NTD has been proposed [41,42] to deal with naturally nonnegative and observation data. NTD is a powerful tool for the extraction of nonnegative-parts-based and physically meaningful latent components from high-dimensional tensor data while preserving the natural multilinear structure of the data [24]. In mathematical terms, it decomposes a tensor into a set of matrices and a core tensor.

Given an order tensor with K as the tensor dimension. NTD seeks a decomposition of a nonnegative K -way tensor (R+ is the space of arithmetic number, K is the size of dimensional space, I is one of the orthogonal basis) as mode products of a nonnegative core tensor (J is one of the orthogonal basis) and K nonnegative factor matrices 

where  are the factor matrices and can be regarded as the principal components of mode K. The entries of  show the level of interaction and connections between different components. In this method, both the core tensor  and the factor matrices should be elementwisely nonnegative. Elementwise, they are given as follows:

where  is a factor matrix, RK is size of factor. r and are the dimensions of mode K in the factor matrix. The decomposition is then modeled as an optimization problem:

where F is matrix norm.

《2.3. Latent class analysis》

2.3. Latent class analysis

LCA is a statistical method for finding subtypes of related cases (latent classes) from multivariate categorical data [43,44]. The latent class model is written as follows:

where  is the probability distribution function, i is dimensional index, N is principal patterns of each mode, T is the number of classes, t is the index of class, pt is the recruitment that should sum to 1,  is the conditional probability, and are the size of probability factors. LCA defines latent classes by the criterion of conditional independence. This means that each variable is statistically independent of every other variable within each latent class. Thus, each element in the probability tensor can be calculated as the sum across all pattern combinations.

where  is the probability vectors characterizing the principal patterns of each mode N.

In our research, we apply latent class models, which assume that each observation is generated from a mixture of underlying classes, and each class is associated with a unique probability distribution. Thus, the joint distribution is regarded as a mixture of product-multinomial and the probability of observing a flight xa . With the notations in Section 2.1, all the flight records x can be summarized into an m-way tensor with dimension  and each cell   (where represents a dimension) is the counting of flight numbers .   is a two-valued indicator function.  = 1 if it is true and 0 otherwise. To better understand the interior connections of the dataset, we put these collected flight records into a probabilistic tensor, each cell of which represents the probability of a flight belonging to that particular cell. The probabilistic tensor with each cell is the probability of a flight belonging to a particular cell  (where pis the probability of a particular cell). The probability of observing a flight xa (i.e., the probability mass function) can be reformulated by the Tucker decomposition in a similar way.

The core tensor captures the interactions across different modes.  are the probability vectors characterizing the principal patterns of each mode N. Each element in the probability tensor can be calculated as the sum across all pattern combinations.

 is the probability vectors characterizing the main patterns of each mode. This model, which is equivalent to a nonnegative Tucker (NNT) decomposition [45,46], can identify universal patterns of different dimensions and reveal the interactions through the core tensor. The EM algorithm is applied to efficiently derive the model [47].

《3. Case study》

3. Case study

《3.1. Flight operation data》

3.1. Flight operation data

The flight operation data is used as a proxy for the collective features of flights in this analysis. The dataset analyzed in this paper was provided by the Civil Aviation Administration of China (CAAC). Given that the purpose of our study is to provide decision-making support for air traffic and airport management, flight delay conditions are of particular interest. Moreover, air traffic flow is the basis of delay conditions. Therefore, it is natural to choose air traffic flow and flight delays as the main objects of our study. Departure delays are chosen, as they can better reflect the level of congestion of a departure airport and airspace. The categorical values of flight data are shown in Table 1. The database contains 13 492 326 domestic flights. All flights connect 224 airports, among which Beijing Capital International Airport has the most flights, accounting for 6.3% of flights in all airports. The departure date of the flights is from 1 January 2014 to 31 December 2017, and the departure time could be in any hour within a day. As can be seen, all the flights can be divided into four groups according to the departure delay time. The percentage of flights in the four groups are 37%, 38%, 14%, and 11%, respectively. The average departure delay of these flights is 31.08 min. Canceled flights are ignored in this study. 9 February 2016 was the day with the greatest number of flights, which was 12 419. 1 January 2014 was the day with the least number of flights, which was 7009. The busiest time period for airports is from 8:00 to 9:00, with 6% of all flights taking off during this time period.

《Table 1》

Table 1 Categorical values of flight data.

《3.2. Factorization analysis》

3.2. Factorization analysis

In our research, we assume that flight records are multivariate variables sampled from a universal distribution. The 13 492 326 flights are aggregated to the same tensor. Each observation contains four variables, including departure airport, order of the day during the four years, time of day, and level of delay. The total combinations are 224 × 1461×24× 4. Here, we use a small 3 (departure airport, A) × 4 (order of day in four years, D)× 5 (time of day, H)× 4 (level of delay, L) core tensor π to capture the interactions across different modes. Although a bigger core tensor could contain more information and reflect comprehensive relationships among different modes, this small core tensor can facilitate the interpretation of results. Moreover, existing research shows that the results are basically consistent across different sizes of core tensors [48]. In the following paragraph, we present the main results based on the core tensor size [3× 4 × 5× 4] as an example. The hour, day, airport, and delay-level factor matrix allows us to interpret the pattern due to its distribution profile.

Fig. 1(a) depicts hour profiles of the five patterns. The component pattern H1, covering 18.7% of all flights, reveals a gradual rise from 11:00 and reaches its peak at 24:00, followed by a decrease until the early morning. Pattern H2 shows similarities to a Gaussian distribution with a peak at 17:00, and covers about 24.4% of all flights. Compared with patterns H1 and H2, patterns H3 and Hare more concentrated. They increase significantly in the morning and display a sudden drop by noon. Pattern H4 reaches its peak at 11:00 and then decreases continuously over the rest of the day. The proportions of H3, H4, and H5 are 22.9%, 22.2%, and 11.8%, respectively. The day factor matrix has 1461 rows and four columns, which depict the corresponding relationship between original days and day patterns. We analyze the day patterns across different months of the year and days of the week. To do so, we sum up the daily traffic probability along different months and days of the week to identify how the traffic flow interacts with the dates. Patterns across days of the week are shown in Fig. 1(b). W is the weekly pattern and M is the monthly pattern. Most traffic observations of pattern W1 are concentrated on weekdays, while the peaks of pattern W3 are on the two-day weekend. Interestingly, we found that patterns W2 and W4 also show opposite trends. W2 is mainly concentrated on Monday, Saturday, and Sunday, while pattern W4 is concentrated on Wednesday, Thursday, and Friday. Fig. 1(c) shows patterns of different months. We can observe clear seasonal diversity in that no pattern is equally distributed across the seasons. Pattern M1 is mainly concentrated in autumn and winter, while pattern M2 is concentrated in winter and spring. Pattern M3 is concentrated in spring, and pattern M4 is mainly distributed in summer, which is the travel peak of the aviation system.

《Fig. 1》

Fig. 1. Principal patterns in different modes. (a) The Y-axis is the probability of each pattern (column) in the hour factor matrix Probability(H); (b) the Y-axis is the probability of day of the week along with the day factor matrix Probability(W); (c) the Y-axis is the probability along with the day factor matrix Probability(M).

Even though the estimation process does not take any spatial location information into account, we find that these components can be identified by the geographic locations of airports. As the traffic hub of China, Beijing Capital International Airport is obviously prominent in patterns A1 and A2. Furthermore, pattern A1 is mainly distributed in the southeast, while pattern A2 is mainly distributed in the southwest. Pattern A3 is mostly composed of airports in the midwest.

Fig. 2 shows a composition of the level of delay for each pattern (column) in the delay-level factor matrix. L1, L2, L3, and L4 represent flights of different levels, varying from ‘‘on time” to ‘‘serious delay” (as mentioned in Section 2.1). As can be seen, the delay-level patterns remain almost the same compared with the original tensor, with flights of different delay levels being identified. Fig. 2 shows the composition of each delay-level pattern, which covers 40.7%, 10.5%, 29.1%, and 19.7% of total traffic flow, respectively. It is noted that delay is a manifestation of air traffic congestion, and delay level thus closely interacts with traffic flow characteristics in spatiotemporal factors. To conduct further analysis, we adopt conditional probability across the delay-level mode and other modes to investigate the interactions. We calculate the conditional probability distribution Probability(L|H) by Bayes’ theorem. Given the hour patterns, the conditional distribution of delay-level patterns is

Probability(L|H) shows how the delay-level patterns interact with the hour patterns. As can be seen, pattern L1, denoting the on-time component, strongly interacts with all the hour patterns, which indicates that on-time flights occupy the mainstream at any time of day. To characterize the rules of delay, we now discuss the delay-level patterns excluding L1. H1, denoting the traffic pattern of an increasing trend throughout the entire day, is associated with delay pattern L2, which corresponds to slightly delayed flights. H2 is mainly covered by the hour patterns L3 and L4, denoting that the high-level delays are highly correlated to the traffic in the afternoon. H3, H4, and H5 are mainly covered by L3 and L4. This can be explained by the fact that flights during the travel peak in the morning are likely to be seriously delayed due to the heavy air traffic in most airports.

《Fig. 2》

Fig. 2. Delay-level composition probability for each pattern (column) in the delay factor matrix.

Based on the factor matrices of the day pattern and delay-level pattern, we can find that on-time and slight delay patterns occur the most in M1/W1, M2/W2, and M3/W3; however, pattern M4/W4 is very unusual. Along with the week and month dimensions, M4/W4 indicates that the traffic is mostly concentrated on weekdays and in the summer, and tends to interact with serious delays.

As shown in Probability(L|A) airport patterns are also correlated with the delay-level patterns. The obvious point is that pattern A3 is mainly covered by L1, which may denote that air traffic from the airports in the midwest have fewer delays. Compared with A3, A1 and A2 are more likely to be seriously delayed. This can be explained by the midwest airports’ relatively lighter traffic flow and adequate airspace resources. As shown in the analysis above, delay is heavily influenced by time and space. To further explore the interactions among the patterns, we present the disaggregated distribution along the temporal and spatial dimensions. As can be seen in Fig. 3, the value of (L2, H1) is greater than that of the other cells in A1, indicating that when afternoon or evening flights take off from the southeast airports, they are accompanied by slight delays. We also observe a considerable amount of traffic within the A2 pattern in (L3, H3), while such traffic rarely appears in A1 and A3. This can be explained by the fact that flights from southwest airports during the morning peak often have moderate delays. As mentioned before, due to their excess capacity, the midwest airports (A3) have a low probability of flight delays. This phenomenon is consistent with cells (L1, H2H4) in A3.

《Fig. 3》

Fig. 3. Interaction of delay level and hour for different airport patterns.

The numerical examples show that the factorization allows us to explain the complex dependence and higher-order interactions based on latent factors. The core tensor π characterizes the interactions between different modes in a very efficient and informative manner. This framework helps us to comprehend and interpret the latent interactions and complex dependence among the patterns in large datasets, enhancing our understanding of air traffic management.

On this basis, another significant problem is generated. Can the delay level be estimated if we only consider the information about latent patterns extracted from the time and space information? Although previous studies have demonstrated that flight delays can be attributed to many complex factors [49], this issue may be of great significance. Firstly, potential patterns based on historical information are generated by the collective effect of various delay causes. For example, extreme weather occurs frequently in summer, and we find that summer interacts strongly with the serious delay pattern in this paper. Secondly, as spatiotemporal information does not require us to understand operational characteristics in great detail, the departure time and departure airport alone can help us make a preliminary assessment of delays beforehand. Thirdly, the highly dynamic nature and complexity of aviation systems lead people to believe that delays cannot be estimated through basic information. Thus, the effectiveness of the potential patterns produced by our framework is also demonstrated if accurate estimation is achieved. For further exploration, the random forest (RF) algorithms employed here to construct an estimation model, due to its adaptability in regard to this problem [50]. The advantages of RF include the following: ① It generates an internal unbiased estimate of the generalization error as the forest building progresses; ② it runs efficiently on large databases; and ③ it has the ability to model interactions among variables [51]. Specifically, an RF is an ensemble tree  , where B is the number of trees and  is a -dimensional vector of descriptors. The ensemble produces B outputs  , where  is the estimation of the bth RF tree (b = 1,..., B). The outputs of all trees are aggregated to produce one final estimation, .  For classification problems,   is the class estimated by the majority of trees. The training procedure is as follows:

(1) Prepare the training data. We prepare -dimensional the training data samples along with their class label.

(2) Select the parameters. E: maximum depth of each tree; C: minimum number of samples required to split an internal node; V: the variables at each split; ML: the minimum number of samples required to be at a leaf node.

(3) Grow a classification tree. For b = 1 to B, draw a bootstrap sample Z* of size (Sd) from the training data. Use about two-thirds of the original training samples to grow a classification tree. Leave the remaining one-third of the samples as so-called out-of-bag (OOB) samples.

(4) Grow trees. For each bootstrap sample, grow a tree with the following modifications. At each node, pick the best variable/split-point and split the node into two nodes until the sample number of this node is less than C. The tree is grown to the maximum size E and not pruned back. Repeat the steps above until B such trees are grown.

(5) Results output. Estimate new data by aggregating the estimations of the trees with the majority vote of all analogous trees in the forest. Output = majority vote .

To evaluate the performance of the estimation model, four indices are employed, referring to Eqs. (11)–(14). F1macro is the macro-averaged score and weights all the classes (1,..., u) equally, regardless of how many samples Su belong to a given class u. F1micro is the micro-averaged score and weights all the samples equally, thus favoring the performance of common classes. Weighted score finds the average in each label, weighted by the number of true instances for each class. AccuracyOOB is the mean accuracy on each training sample Z* , using only the trees that do not have Z* in their bootstrap sample [52].

where Pmacro is the macro-averaged precision, Rmacro is the macroaveraged recall, Precisionu is the number of true positives divided by the total number of elements labelled as belonging to the positive class for a class u, and Recallu is defined as the number of true positives divided by the total number of elements that actually belong to the positive class for a class u.

where Pmicro is the micro-averaged precision, Rmicro is the microaveraged recall, TPu is the true positive rate of class u, FPu is the false positive rate of class u, and FNu is the false negative rate of class u.

where Su is number of samples of class u, S is number of samples.

where TPOOB is the true positive rate of OOB samples, FPOOB is the false positive rate of OOB samples, FNOOB is the false negative rate of OOB samples, and TNOOB is the true negative rate of OOB samples.

For this study, the data of latent spatiotemporal patterns are used. The classification problem involves the identification of four delay levels (on time, slight delay, moderate delay, and serious delay). To estimate how accurately a model will perform in practice, this paper adopts a five-fold cross-validation strategy. One round of cross-validation involves partitioning all the records into five complementary subsets, performing the training process on four subsets, and then validating the analysis on another testing set. Next, the validation results are averaged over the five rounds to give an estimation of the model’s performance. A total of 13 492 326 records are used. As mentioned above, the probability values of the flights in each mode (latent spatiotemporal patterns) are selected as the features; that is, {A1A4, H2H5, W1W4, M1M4}. Each feature ranges from 0 to 1.

The number of trees is the most important variable. It should be large enough to let the generalization error of the RF converge. In Fig. 4, we find that AccuracyOOB changes from 53.1% to 53.4% when B increases from 100 to 150, and increases slightly when B is greater than 150. This indicates that the RF classifier is almost insensitive to the increase of B when it is greater than 150. To obtain a better parameters set, a grid search method is used in this paper. Several combinations of other parameters are tried to determine the optimal values for this model.

《Fig. 4》

Fig. 4. Model performance with different numbers of trees.

The overall performance and the accuracy of individual classes are given in Fig. 5. As prior studies have noted, delays are mostly determined by dynamic operational factors. It is not possible to give a judgment much earlier than the actual occurrence of the delay [53,54]. Our model only considers the latent patterns revealed through basic flight information, which may not be viewed optimistically. In this case, the RFs achieved more than 50% in all performance indices and achieved accuracies of 60.0%, 46.0%, 44.0%, and 65.0%, respectively, for the individual classes, which may be a very positive sign. It signifies that the probability of different levels of flight delay can be estimated preliminarily based only on the time and airport information. Furthermore, this application supports the effectiveness of the potential patterns. We can also see that the ‘‘on time” pattern and ‘‘serious delay” pattern are classified more accurately due to their distinctive features. Even if the algorithm has more difficulty in classifying the ‘‘slight delay” and ‘‘moderate delay” classes, the deep color near the diagonal of the confusion matrix indicates that the wrong estimations have a small bias.

《Fig. 5》

Fig. 5. Classification results. (a) Overall performance through different indices; (b) the confusion matrix.

《4. Conclusions》

4. Conclusions

In this paper, we develop a probabilistic factorization framework that transforms massive flight record data into a spatiotemporal tensor. Our purpose is to investigate the spatiotemporal dynamic patterns of air traffic and flight delays. We assume that each flight observation is a sample generated from a universal joint distribution. Then, we formulate this joint task using nonnegative tensor factorization, which has been shown to be a useful analytical tool for high-dimensional massive data. The results show that clear patterns are identified in different modes. The interactions of different modes are also characterized in the core tensor, which interprets the relationship between delays and spatiotemporal patterns. The detailed analysis clearly shows that serious delays are strongly correlated to afternoon, weekday, and summer traffic. Flights from midwest airports have a low probability of delay at any time of day.

This framework can shed light on the modeling of flight delays by providing an understanding of flights in both the spatial and temporal dimensions. Moreover, the potential patterns have been shown to display a certain effect on delay estimation. As the integration of space and time information, potential patterns can give positive estimation results regarding the delay level. In the context of highly dynamic environments and the complexity of delays, this output gives us a new comprehension of how air traffic and flight delays interact with time and space. This framework provides a deep understanding of massive aviation data through the latent class model and probabilistic factorization method. The outcomes of this study can help airport operators and air traffic managers to better prepare air traffic and airport arrangements based on the experience gained from historical scenarios. Further analysis can be performed to tackle the challenge of understanding interactions involving more factors, such as weather and air route attributes.

《Acknowledgements》

Acknowledgements

The authors would like to thank Yu Zhang from the University of South Florida and Shenqi Zhang from Carnegie Mellon University for technical support and suggestions, which helped a lot to improve the quality of this work. This paper is supported by the National Key Research and Development Program of China (2019YFF0301400) and the National Natural Science Foundation of China (61671031, 61722102, and 61961146005).

《Compliance with ethics guidelines》

Compliance with ethics guidelines

Mingyuan Zhang, Shenwen Chen, Lijun Sun, Wenbo Du, and Xianbin Cao declare that they have no conflict of interest or financial conflicts to disclose.