《1. Introduction 》

1. Introduction

Due to climate change and anthropogenic effects, extreme hydrological events occur with increasing frequency and intensity, resulting in great economic losses and posing intensified risks to people and the environment [1–5]. The frequency of a specific event (e.g., floods, droughts, or storms) is crucial for inferring magnitudes of the event with different occurrence probabilities, which are used for many eco-hydrological management practices [6–8]. For example, the design of hydraulic structures such as dam spillways requires the frequency of flood in the drainage to be quantified [9]. However, many hydrological events cannot easily be described by a univariate variable, and generally show multidimensional features. Floods are a typical example, as they are generally characterized by peak flow, volume, and duration. Moreover, these three attributes of a flood are correlated with each other. However, a traditional univariate flood risk analysis mainly focuses on the probabilistic feature of flood peak flow, which does not fully describe the flood characteristics (e.g., dependence among flood peak, volume, and duration). Therefore, flood frequency analysis within a multivariate context is required in order to obtain an insightful description of extreme hydrological events and generate reliable risk inference [10,11].

The concept of the copula function has been widely used for many eco-hydrological issues due to its powerful capability to reflect complicated dependence structures among correlated variables. A number of copula-based approaches have been proposed for the multivariate risk analysis of floods [10–13], droughts [14,15], and storms [16,17]. In addition, some studies have reported the development of copula-based forecasting approaches for probabilistic streamflow simulation [18–21] and for energy and environmental systems analysis [22–25]. Compared with conventional multivariate statistical approaches, the copula-based approach allows separate quantification of the marginal distributions and the dependence modeling, thus making it possible to use different types of marginal and joint probability functions [2,13,26,27]. Thus, in flood risk analysis, the distributions of flooding peak, volume, and duration can be quantified through various parametric or non-parametric distribution formulations. For example, Karmakar and Simonovic [2] investigated the impacts of the selection of marginal distributions on the performance of copulas. One major issue in hydrological risk analysis is the extensive uncertainty resulting from data availability, model selection, and parameter estimation. There are two primary sources of uncertainty: ① natural uncertainty stemming from the inherent variability of the underlying hydrological processes, and ② epistemic uncertainty coming from an incomplete understanding of the studied system [28]. In particular, data availability is a major source of uncertainty in flood frequency analysis. Several previous studies have addressed uncertainties in flood risk analysis. For example, Liang et al. [29] analyzed model and parameter uncertainties using a Bayesian-based approach. However, most previous research primarily focused on univariate hydrological risk. Few existing studies examine the uncertainties of hydrological risk within a multivariate context.

Therefore, the objective of this research is to propose a multivariate risk-assessment framework through an interactive analysis of the multiple attributes of a flood. The uncertainty of the bivariate hydrological risk is evaluated based on the Bayesian method, and the posterior probabilities of the parameters in the marginal and joint distribution are obtained based on the Markov chain Monte Carlo (MCMC) method. Uncertainties in the primary bivariate return periods are quantified.

《2. Methodology》

2. Methodology

《2.1. The concept of the copula 》

2.1. The concept of the copula

Based on the copula function, a multivariate probability distribution can be constructed as follows:

where Fx1(x1) , Fx2(x2), ..., Fxn(xn)  denote the marginal distributions of the random vector (X1, X2, ... Xn). If these marginals are continuous, a copula function C exists, such that [13]:

where ui = FXi (x) and i = 1, 2, ... n. Nelsen [30] provides a detailed description of copulas in terms of their theoretical background and properties.

Many copula functions are available for practical multivariate analysis; these are mainly classified as Archimedean, elliptical, and extreme value copulas. This study uses Archimedean copulas due to their ability to capture a wide range of dependence structures [31]. These copulas can take a great variety of forms. Furthermore, they can have distinct upper and lower tail dependence coefficients. Thus, Archimedean copulas can capture dependence in the upper and lower tail dependences. Table 1 lists the potential copula functions used in this study and their inherent features.

《Table 1 》

Table 1 Basic properties of applied copulas.

D1 is the first-order Debye function, and for any positive integer k,

《2.2. Conditional distributions and joint return period》

2.2. Conditional distributions and joint return period

Based on the copula function expressed in Eq. (1), the conditional distribution for each Xi can be derived. Taking a bivariate case as an example, where U1 = FX1 (x)  and U2 = FX2(x) , the conditional distribution for U1 with U2 = u2 can be derived as follows [12]:

Similarly, the conditional distribution function of U1 when U2 ≤ u2 can be expressed as follows:

In addition, the probability density function (PDF) c(u1, u2) of a copula is derived by the following:

Next, the joint PDF of X1 and X2 can easily be derived by c(u1, u2);

Consequently, the conditional PDF of X1 for a given value of X2, is expressed as follows:

The conditional PDF of X2, with a given X1, is formulated as follows:

Based on the copula function, the concept of return period, which is widely used in univariate hydrological frequency analysis, can be extended to the multivariate context, leading to the concept of a joint return period. To be specific, the joint (primary) return periods can be further characterized by "OR” and "AND” cases [13,32,33]:

where μ indicates the mean inter-arrival time of the two consecutive extreme events.

《2.3. Bivariate hydrological risk analysis》

2.3. Bivariate hydrological risk analysis

In practical hydraulic infrastructure design, it is essential to know the random features of the floods that will flow through the infrastructure. Moreover, a flooding event is usually characterized by multiple attributes such as peak, volume, and duration. Therefore, flood risk in a multivariate context would provide more insightful information than univariate flood risk in terms of the flooding peak. For example, a flood associated with a large peak value and a long duration (e.g., more than 7 d) may lead to great economic losses, whereas a short-term high-peak flood may only be a flash flood. Thus, we introduce a bivariate risk indicator based on the joint return period in "AND,” which is used to reveal interactive effects among the multiple attributes of the flood. The bivariate risk indicator can be expressed as follows:

The bivariate hydrological risk Ru1,u2 is introduced in this study in order to investigate the significance of effects from persistent high-risk levels due to impacts from multiple interactive flood variables. It indicates the joint risk of u1 and u2 respectively occurring above their thresholds. For example, for a specific hydraulic facility with a design peak flow and service time standards, if u1 and u2 respectively denote the flood peak and volume, then the value of Ru1,u2 indicates the risk of a flood with a peak that is larger than the design peak flow and a volume that is greater than a certain threshold. Therefore, in this study, we adopt a joint return period in the "AND” case in order to express the bivariate hydrological risk.

《2.4. Parameter estimation through the Bayesian model》

2.4. Parameter estimation through the Bayesian model

Extensive uncertainties may be involved in the parametric estimation of a copula function due to: ① the inherent uncertainty in the flooding process, ② uncertainty in the selection of appropriate marginal functions and copulas, and ③ statistical uncertainty or parameter uncertainty within the parameter estimation process (e.g., the availability of samples). The Bayesian approach has been widely applied for uncertainty quantification, since it can incorporate various sources of information into a single analysis using Bayes’ theorem. Given the prior probability density and observations, the posterior distribution can be derived using Bayes’ theorem, which is expressed as follows:

where π0(θ) signifies the prior parameter distribution, L(θ∣X) denotes the likelihood function, ∫L(θ∣X0(θ)dθ is the normalization constant, π(θ∣X) is the posterior probability density function, and X=(x1, x2,..., xn) is the observation vector (i.e., X = (x1, x2) in this study).

For a bivariate hydrologic risk analysis using the copula, it is necessary to estimate the parameter of the copula, as well as the parameters of the two marginal probability distributions. Let θc , θ1, and θ2 denote the parameters of the copula and of the two marginal probability distributions, respectively. The posterior distribution can then be derived as follows:

The term of L(θc , θ1, θ2X) is the likelihood function of the observations. Based on the dependence structure between the copula and its marginal distributions, as expressed by Eq. (6), the likelihood function can be estimated as follows:

where is the density of the copula function, and are the two marginal probability density functions, respectively.

The procedures to derive the posterior distributions are presented below:

Step 1: Apply the maximum likelihood method to estimate the initial values for the parameters in the marginal and copula models.

Step 2: Calculate the root-mean-square error (RMSE) and Akaike information criterion (AIC) values for the marginal and joint probabilities in order to choose the most appropriate marginal and joint distribution forms.

Step 3: Set the prior distributions based on the parameter values obtained in Step 1.

Step 4: Use the MCMC with the Metropolis–Hastings algorithm to derive the posterior probabilities for the parameters in the marginal and joint distributions.

Step 5: Evaluate the uncertainties in the primary and joint return periods.

Step 6: Reveal the probabilistic features of the bivariate hydrological risk.

《3. Application》

3. Application

《3.1. Catchment characterization and data collection》

3.1. Catchment characterization and data collection

The proposed approach is applied to determine the hydrological risk for the Xiangxi River within the Three Gorges Reservoir (TGR) area in China. The Xiangxi River is located in Hubei Province. As shown in Fig. 1, the Xiangxi River is one of the main tributaries of the Yangtze River, and originates from the Shennongjia Nature Reserve area. It has a main stream length of 94 km and a catchment area of 3099 km2 [34]. This area has a northern subtropical climate with a mean annual precipitation of 1100 mm and a fluctuation range of 670–1700 mm [35]. The flooding season in the Xiangxi River Basin is from July to August, and the main rainfall season is from May to September. In this study, the daily discharge data (1961–2010) observed at the Xingshan Hydrologic Station (110°45'0'' E, 31°13'0'' N) is used for the probabilistic assessment of flood risks for the Xiangxi River. 

《Fig.1》

 Fig. 1. The location of the studied watershed. DEM: digital elevation model.

《3.2. Flooding characteristics in the Xiangxi River》

3.2. Flooding characteristics in the Xiangxi River

In this study, the flood characteristics are identified as the peak, volume, and duration; the flood peak is the maximum peak discharge, while the volume and duration are obtained from the hydrograph of the maximum discharge. According to Yue [36,37], the flood duration (Di) of a single peaked hydrograph in year i (Fig. 2 [31]) can be obtained by characterizing the time moment of the rise (Dsi in Fig. 2) and fall (Dei in Fig. 2) of the hydrograph. As shown in Fig. 2, the starting time of a flood is marked as a sharp rise on the hydrograph, and the end of the flood runoff is identified by an inflection point on the receding limb of the hydrograph [10]. If the rise moment of a flood is denoted by Ds (d) and the end time is denoted by De (d), then the flood volume (denoted as Vi, for this flood event) can be obtained as follows:

《Fig.2》

Fig. 2. Typical flood hydrograph showing flood flow characteristics (adapted from Ref. [31]).

where qij is the observed stream flow on the jth day of the ith year, qsi is the streamflow on the start day of the flood in the ith year, and qei is the streamflow value on the end day of the flood. Dsi and Dei respectively denote the start and end day for a flood occurring in the ith year. Di denotes the flood duration, which is obtained by the following:

The annual peak of the flood is obtained by the following:

Based on Eqs. (15)–(17), the flood characteristics can be identified as the peak, volume, and duration. Table 2 provides descriptive statistics values for the flood peak, volume, and duration at Xingshan Hydrologic Station. The positive values of kurtosis and skewness suggest that the flood variables can be modeled by sharp and right tailed distributions.

《Table 2》

Table 2 Statistical characteristics of flood variables

《4. Initial parameter values for marginal and joint distributions 》

4. Initial parameter values for marginal and joint distributions

For a single flood variable such as flood peak, many parametric distributions are available to quantify its probabilistic feature, such as the general extreme value (GEV), log-Pearson Type III (LP3), and Pearson Type III (P3) distributions [38,39]. In this study, the gamma, GEV, and lognormal distributions are employed to quantify the probabilistic features for flood peak, volume, and duration, and the maximum likelihood estimation (MLE) method is used to estimate the unknown parameters in these distributions. Table 3 provides detailed expressions of the potential distributions and the parameter values obtained by MLE. A comparison of the theoretical distributions and the observed values is plotted in Fig. 3. The RMSE and AIC indices are applied to evaluate the performance of each marginal distribution, as shown in Table 4. The results show that the lognormal distribution performs best for all three flood variables in the Xiangxi River.

《Table 3 》

Table 3  Parameters of marginal distribution functions of flood variables.

《Fig.3》

Fig. 3. Comparison of different probability density estimates with observed frequency.

《Table 4》

Table 4 Comparison of RMSE and AIC values of flood variables for different statistical distributions.

Table 5 provides the values for the linear correlation coeffi- cient and Kendall's tau (τ) among the three flood variables (i.e., peak, volume, and duration). The results indicate that the highest correlation is between flood peak and volume, followed by the correlation of volume–duration and peak–duration. To quantify the dependence structures of flood peak–volume, volume– duration, and peak–duration, four Archimedean copulas—that is, the Ali–Mikhail–Haq (AMH), Cook–Johnson (Clayton), Gumbel–Hougaard, and Frank copulas—are employed, and their unknown parameters are estimated by a method-of-moments (MOM)-like estimator based on the inversion of Kendall's τ. The performances of the copulas in quantifying dependence among flood variables are evaluated by RMSE and AIC. Table 6 provides the results of the RMSE and AIC for different copulas for modeling the joint distribution of flood peak–volume, peak– duration, and volume–duration. The results suggest that joint distributions of flood peak–volume and volume–duration can be quantified best by the Frank copula, whereas the joint distribution of the flood peak–duration can be quantified best by the Ali–Mikhail–Haq copula.

《Table 5 》

Table 5  Values of correlation coefficients for flooding characteristics.

《Table 6 》

Table 6 Comparison of RMSE and AIC values for joint distributions through different copulas.

《5. Uncertainty evaluation for bivariate hydrological risk》

5. Uncertainty evaluation for bivariate hydrological risk

The prior distributions are specified for all parameters using the normal distributions, with the mean values being set as the initial values obtained through the MLE and MOM methods, and with relatively large variance values, as shown in Table 7. There are a total of 5000 iterations in the MCMC process, with the last 500 iterations being set as the posterior parameters of the parameters in the copula models. Fig. 4 presents the evolution of unknown parameters in the marginal distributions and copulas during the MCMC process. It can be seen that the parameters estimated using the Bayesian method are stationary and concentrated to stable values. The posterior distributions of the unknown parameters in the marginal distribution and copula functions are presented in Fig. 5. Based on the sampling sets of the parameters obtained through MCMC, the sampling distributions of the design values can be obtained; these cannot only provide the point estimation, but also give its predictive interval (PI).

《Table 7》

Table 7 Prior distribution for each parameter of Bayesian model.

《Fig.4》

Fig. 4. Evolution of unknown parameters (a) during the MCMC process and (b) after burn-in period.

《Fig.5》

Fig. 5. Posterior distributions of the unknown parameters in marginal distributions and copulas. LN: lognormal.

The parameters in the marginal distributions and copulas, as obtained by the Bayesian method, are presented as probabilistic distributions, which lead to uncertainty in the marginal and joint distributions of the flood variables. Fig. 6 shows a comparison of the mean probabilities and the observed values. In addition, 95% predictive intervals are provided in Fig. 6, which are bracketed by the 2.5% and 97.5% quantile values of the predictions. This figure indicates that the obtained predictive intervals can bracket the observed probabilities well. In particular, Fig. 3 shows that the estimated distributions through MLE produce an obvious discrepancy to the observed values for flood durations. Such deviation may result from data limitation regarding flood duration. Only daily streamflow records were available for this study, so the flood duration could only be calculated in days, leading to extensive uncertainty in the distribution of flood duration. In comparison, the results in Fig. 6 show that most of the observed data are bracketed within the 95% predictive intervals, which implies a good performance of the Bayesian method in parameter estimation for flood duration.

《Fig.6》

Fig. 6. Comparison of fitted and observed probabilities for flood variables: (a) flood peak, (b) flood volume, and (c) flood duration.

In the Bayesian method, the unknown parameters are considered as random with specific prior distributions; the posterior distributions of these parameters are then derived by the MCMC method. The randomness in the parameters of the marginal and joint probabilities lead to uncertainties in the associated conditional cumulative distribution functions (CDFs) and PDFs generated by the obtained copulas. Fig. 7 presents the conditional CDFs of flood volume and duration, with the flood peak flow specified as the 95% quantile value. Based on the posterior distribution of the parameters, the obtained CDFs and their associated 95% predictive intervals can be obtained. In addition, the conditional PDF of flood volume and duration, under a flood peak with different return periods, can be derived through Eqs. (7) and (8). Fig. 8 shows the conditional PDF of flood volume and duration. Two flood peak scenarios (10- and 100-year return periods) are assumed; the conditional PDFs of flood volume and duration indicate that as the flood peak increases, the flood volume is expected to increase as well, while the flood duration may not change significantly. Moreover, based on the posterior distributions of the parameters, the predictive intervals can also be obtained.

《Fig.7》

Fig. 7. Conditional CDFs of (a) flood volume and (b) duration under 95% flood peak flow.

《Fig.8》

Fig. 8. Conditional PDFs of (a) volume and (b) duration under different peak flow return periods.

The joint return periods calculated by Eqs. (9) and (10) can reflect the concurrence probabilities of flood peak–volume, peak– duration, and volume–duration well. Furthermore, the random features of the model parameters leads to uncertainty in the joint return periods. Table 8 presents the 95% predictive intervals for the univariate variable and joint return periods for the flood variables. It can be seen that the uncertainty of the flood peak, volume, and duration values increases with an increase in the return period. For example, the flood peak with a 5-year return period may vary within a 95% predictive interval [594.1, 813.9] m3·s-1 , while such an interval would be [1156.3, 2007.1] m3·s-1 for the flood peak with a 100-year return period. For the joint return period, Table 8 shows a remarkable feature: The uncertainty of the joint return period in  "OR” will not change significantly. The joint return period in "AND” has similar uncertainty characteristics as the univariate flood variables, as presented in Table 8. For example, the 95% predictive interval of  with a 100-year primary return period would be [1229.4, 1593.8] years, while the 95% predictive interval of  would be [51.62, 52.12].

《Table 8》

Table 8 95% predictive intervals for the univariate and primary return periods for flood characteristics

T: return period; PV: peak-volume; PD: peak-duration; DV: duration-volume.

In this study, a bivariate flood risk indicator, expressed as Eq. (12), is introduced to reflect the interactive effect among flood variables and their implications on the detailed hydrological risk. For a flooding event, the failure of hydraulic structures is mainly due to high peak flow, and the flooding peak flow is the essential factor to be considered when analyzing hydrological risks. Therefore, the bivariate flood risks for flood peak–volume and peak–duration are analyzed. Fig. 9 also shows the changing trends of the flooding risk with respect to flooding volume under different design flows and service times. The bivariate risk of flooding peak flow and volume remains constant for some time and then decreases with an increase in flooding duration for all design flows and service times. Furthermore, for a specific combination of flood peak–volume, the detailed risk may be uncertain, as a result of the parameter uncertainties in the marginal and joint distributions. Based on the posterior distributions of these parameters, the mean value and the 95% predictive interval of the bivariate risk can be derived, as presented in Fig. 9. The results in Fig. 9 suggest that a lower design discharge and longer service time may lead to greater uncertainty in hydrological risk. Fig. 10 presents the failure risk inference of the river levee around Xingshan Hydrologic Station for different flooding peak–duration extremes. It shows that for the same service time, the inferred risk decreases with an increaical risk for flood peak–duration appears to be similar to that for flood peak–volume.

《Fig.9》

Fig. 9. Bivariate flooding risk under different flooding peak–volume scenarios. (a) 20-year service time; (b) 50-year service time.

《Fig.10》

Fig. 10. Bivariate flooding risk under different flooding peak–duration scenarios. (a) 20-year service time; (b) 50-year service time.

《6. Conclusions and remarks 》

6. Conclusions and remarks

The copula approach is one of the most-used approaches in hydrology and water resources. Many studies have analyzed the risk of water hazards (e.g., flood and drought) using copula approaches to reflect probabilistic features in a multivariate context. However, the uncertainties that exist in both the marginal distributions and the dependence structures of the copula framework are somewhat overlooked. This study aimed to characterize the inherent uncertainties in multivariate hydrological risk analysis. The major innovation of this study is to introduce Bayesian inference into the copula framework in order to quantify uncertainties in both the marginal distributions and dependence structures; the interactions among uncertainties in marginal and joint distributions are also characterized. The approach developed in this study was applied to eco-hydrological risk analysis for the Xiangxi River in order to demonstrate the applicability of this method.

The results indicate that the lognormal distributions may be appropriate for quantifying the marginal distributions of flood variables in the Xiangxi River. The Frank copula performed better than the other copulas when quantifying the dependence of peak–volume and volume–duration, whereas the Ali–Mikhail–Haq copula performed best when quantifying the joint distribution of peak–duration. The MCMC approach was adopted to quantify the uncertainty in the marginal and joint distributions. The results showed that the obtained predictive intervals could bracket the observations well, especially for flood duration. Moreover, the 95% predictive intervals of the joint return period in "AND” and "OR” cases could be derived based on the posterior probabilities of the model parameters. The results showed that the uncertainty of the joint return period in the  "AND” case increases as the return period of the univariate flood variable increases. However, the uncertainty of the return period in the "OR” case does not change significantly. In addition, the uncertainty of bivariate risks for flood peak–volume and flood peak–duration was characterized to reflect the interactive effects among flood variables on the actual failure risk of hydraulic facilities.

Since the introduction of the copula to hydrological analysis by De Michele and Salvadori [40], these kinds of multivariate analysis approaches have been widely used for hydrological risk inference within a multivariate context. This study introduced Bayesian inference into the copula framework in order to characterize the inherent uncertainties in multivariate flood risk analysis, and to further reveal the impacts of interactive uncertainties among flood attributes (i.e., peak, volume, and duration) on the resulting risk inferences. The approach developed in this study was demonstrated on a case study in the Xiangxi River Basin. However, this approach is transferable and can easily be applied to other risk-assessment problems (e.g., drought, storm, etc.) in different locations.

This study attempts to evaluate the uncertainty of copula-based multivariate hydrological risk. The initial idea of the research is to deal with the uncertainty in hydrological risk analysis that stems from data limitation. However, further studies are still required to characterize the uncertainty of hydrological risks that results from model uncertainty and other sources of uncertainty.

《Acknowledgements》

Acknowledgements

This work was jointly funded by the National Natural Science Foundation of China (51520105013 and 51679087) and the National Key Research and Development Plan of China (2016YFC0502800).

《Compliance with ethics guidelines》

Compliance with ethics guidelines

Yurui Fan, Guohe Huang, Yin Zhang, and Yongping Li declare that they have no conflict of interest or financial conflicts to disclose.