《1. Introduction》

1. Introduction

A reliable evaluation of safety levels for structures such as dams—which have a very strong socioeconomic impact on local areas, and which can represent a potential hazard for the people and environment that may be affected by their presence—is of the utmost importance for the different stakeholders involved.

The capability of numerical models to contribute in engineering practice to the quantitative evaluation of the safety margins of structures is nowadays taken for granted in the dam engineering domain, thanks in part to the great amount of work done by the International Commission on Large Dams (ICOLD) Technical Committee on “Computational Aspects of Analysis and Design of Dams.” However, the application of numerical models to real-world problems has suffered for some time from the gap between mathematical modeling specialists and dam engineers and managers. The first group includes information system specialists who are able to develop computer models to their full potential, while the second group often comprises professionals who prefer to revert to traditional methods of calculation and empirical methods based on their proven experience. The main aim of the Committee was to contribute to the filling of this gap and to promote the diffusion of computer software in the field of dam engineering. The Committee was appointed by ICOLD as an ad hoc committee in 1988; finally, during the 2005 ICOLD Annual Meeting, the Committee was appointed as a permanent Technical Committee.

In its intent to guide and help dam engineers wishing to make correct use of computer programs and numerical models, the Committee has promoted a wide-ranging benchmarking program. So far, 13 Benchmark Workshops have been organized; the first occurred in 1991 (in Bergamo, Italy), and the most recent one took place in 2015 (in Lausanne, Switzerland). Among the different technical aims of the Committee activities, the following aims are worth mentioning: the creation of a stronger link between observed dam behavior and the modeling process; the issuing of guidelines to be used for educational purposes in current practice; the promotion of mathematical modeling improvements to approach safety-related problems; and the assessment of the potentialities of computer codes in order to optimize design, instrumentation, surveillance, and safety/risk evaluation procedures. Regarding the last topic, three themes tightly connected to risk assessment have been proposed from 2011 to 2015 (Fig. 1), for which the main different phases of the process have been extensively investigated. The present paper describes in detail those themes proposed in the Benchmark Workshops in 2011, 2013, and 2015 related to dam risk assessment and the main results obtained.

《Fig. 1》

**Fig.1 Connection among themes and risk components of Benchmark Workshops.**

《2. Valencia 2011: Estimation of the probability of failure of a gravity dam for the sliding failure mode》

2. Valencia 2011: Estimation of the probability of failure of a gravity dam for the sliding failure mode

The 11th ICOLD Benchmark Workshop on Numerical Analysis of Dams took place in Valencia from October 20 to 21, 2011. The objective of Theme C of the 2011 edition was to obtain relationships between water pool levels, factors of safety, and probabilities of failure for an 80 m high gravity dam considering the sliding failure mode (foundation contact). Different models were used for the analysis of the dam and its foundation, along with reliability techniques. Contributions from eight teams were reported; these can be found in Ref. [1]. The 8 participant teams were from Ricerca Sistema Energetico (RSE), Technical University of Bucharest (UTBC), Sogreah Consultants (SC-AG), JSC “Vedeneev Vniig” (VNIIG), Polytechnic University of Valencia (UPV), Royal Institute of Technology of Sweden (RIT), Polytechnic University of Madrid (UPM), and Ingeniería de Presas (iPresas). The following process was followed to solve the formulated problem.

《2.1. Factor of safety》

2.1. Factor of safety

First, each group of participants chose a 2D model to compute the factor of safety (sliding failure mode) for different water pool levels. All contributors considered at least a 2D rigid-body limit equilibrium model (LEM). Despite the strong evolutions developed in more sophisticated, finite element-based models, LEM is still recognized by contributors as the most popular and accepted method to evaluate dam safety for this failure mode [2]. In LEMs, the evolution of the horizontal crack was simulated as reducing the effective area at the contact interface between the dam and its foundation that provides resistance to the overturning moment. Two teams also considered deformable-body models to evaluate the crack length, implemented in finite element model (FEM) codes. In these models, different approaches were used to simulate the horizontal crack. Factor of safety was computed for two cases: effective and ineffective drains. Fig. 2 shows differences among team results for the first case.

As can be observed in Fig. 2, there are significant differences among the results prior to the application of reliability techniques. These differences are mainly due to the selected hypothesis and setup. Another important aspect is how the factor of safety was defined. As shown in Fig. 2, using the same LEM of analysis and data set of strength parameters does not necessarily result in the same outcomes.

《Fig. 2》

**Fig.2 Relation between factor of safety and water pool level obtained by participants for the “drains effective” case.**

《2.2. Friction angle and cohesion》

2.2. Friction angle and cohesion

Next, each group defined the distribution of selected random variables: friction angle ( φ) and cohesion ( c). The decision regarding how friction was considered (i.e., whether the random variable is the friction angle, φ, or the friction coefficient, tan φ) had some impact on the results obtained. Based on results, it seems that when tan φ is selected and a normal probability density function (PDF) is assumed, probabilities are higher than when an analysis uses φ as the selected parameter and considers it to be normally distributed.

Another decision is what PDF may be reasonable to use. In this case, an unusually high number of data were provided to make the process easier; however, this is not always the case in real-world problems, where few data are available (if any). Despite the data provided, several distributions were suggested or considered by contributors (normal, log-normal, Rayleigh, and beta distributions).

Decisions related to PDFs are not only linked to the selected distributions but also to the physical meaning of the given adaptation. When an unbounded PDF is used as the normal distribution, the required decision on its truncation becomes a key point of the analysis process, as shown in the results. Again, engineering judgement comes into play when assessing the minimum values to be adopted for the truncation of a PDF.

《2.3. Failure probability》

2.3. Failure probability

Next, participants estimated the probability of failure for the sliding failure mode using at least a Level 2 reliability method and a Level 3 Monte Carlo simulation method. These reliability methods are described in detail in Ref. [3]. The type of reliability method used also had a significant impact on results. Analysis with Level 2 methods is relatively easy to perform and, as long as the number of variables is low, is not time-consuming. Level 3 Monte Carlo simulation methods provide more precise results, but the computing effort may be much higher. Level 2 and Level 3 methods were used in combination with the LEM of analysis. In general, Level 3 reliability methods provided lower failure probabilities than Level 2 methods.

《2.4. Event tree modelling》

2.4. Event tree modelling

Finally, many teams combined the results of the two proposed drainage system conditions, and the total failure probability was obtained by combining individual probabilities in an event tree. Results for Level 3 methods are shown in Fig. 3, including analyses made by teams from RSE, RIT, VNIIG, UPV, UPM, and iPresas.

《Fig.3》

**Fig.3 Relation between probability of failure and water pool level obtained by participants using Level 3 reliability methods.**

Fig. 3 shows significant differences in the results that were obtained by the participants. Most of these differences are due to the way in which the 2D model was set up and how random variables were defined. When comparing the results obtained with LEM and FEM using the Level 3 Monte Carlo method, it is observed that when drains are effective, both models give similar values for water pool levels below the dam crest level. For water levels above the dam crest, the probability of failure estimated with the FEM approaches a value of 1, while the LEM predicts values below 10-2, since the linear stress distribution assumption embedded in the LEM seems to be on the “unsafe” side.

In conclusion, the results presented by contributors opened a field of discussion on the main sources of uncertainty, which include type of model, factor of safety definition, and statistical analysis of random variables. In general, more research is needed to handle uncertainties, as parameter uncertainty is only part of the problem; other sources of uncertainty became explicitly present throughout the process.

《3. Graz 2013: Computational challenges in consequence estimation for risk assessment》

3. Graz 2013: Computational challenges in consequence estimation for risk assessment

The 12th Benchmark Workshop on Numerical Analysis of Dams took place in Graz from October 2 to 4, 2013. The objective of Theme C, titled “Computational Challenges in Consequence Estimation for Risk Assessment” (formulated by Yazmin Seda-Sanabria, Enrique E. Matheu, and Timothy N. McPherson [4]), was to obtain the potential consequences in the case of failure for an embankment dam that is 3.5 km upstream from an urban area.

Participants were asked to select the type and sophistication of the simulation engines used to solve the problem, including 1D, 2D, and 3D flood simulation tools, population at risk (PAR) and loss of life (LOL) estimation techniques, and asset and consequence assessment models. Within Theme C, addressing human consequences (e.g., PAR and LOL) and direct economic impact was required.

Contributions from eight participants were presented, and a full description can be found in Ref. [4]. The comparison of results was based on the following categories.

《3.1. Flood characteristics》

3.1. Flood characteristics

Participants used a range of models, including physics-based breach models using dam material information and regression equations based on previous dam failures. The choice of model, method, and parameters can significantly affect the time and magnitude of the peak discharge. Results show that models using regression equations had a much earlier peak than physics-based models.

In addition, a wide variety of techniques were used by the participant teams to produce the necessary flood output data for consequence analyses. Dam failure hydrodynamic simulation results depend on input datasets (e.g., breach discharge, topography, roughness) and method approach. Participants used different approaches to estimate roughness coefficients, thus providing consequence outcomes with significant differences.

The maximum flooded area for each solution was in the range of 30 km² to 47 km². However, the majority of solutions show substantial similarity when results are compared among participants (by pairs). Despite the observed similarity in the extent of the flooding area, the results for flood-wave arrival time differ among solutions, depending on the considered threshold to define the arrival time (i.e., the time at which the flood depth reaches a given value).

《3.2. Population at risk (PAR)》

3.2. Population at risk (PAR)

Census and land-use data was provided by the formulators. The spatial distribution of the population within the flooded area varied among the participants. Three participants uniformly distributed the population available in the census blocks to properly sum the affected population within partially flooded census blocks. Another three participants redistributed the population to the parcel data provided. One of the contributors accounted for residential population and workforce population within the parcels. Finally, another participant used imperviousness defined by developed areas to distribute population.

These differences in population distribution resulted in a range of PAR from 15 000 to 30 000 inhabitants (approx.), although the solutions presented consistent results in terms of PAR in flood depths below 2 m. However, significant differences were found in how participants defined flood severity (e.g., flood severity based on exposed population or impact on buildings).

《3.3. Loss of life (LOL)》

3.3. Loss of life (LOL)

The majority of solutions provided similar results for LOL, with estimates around 2000 potential fatalities. The identified disagreement in one of the provided solutions was due to the definition of flood severity values, which resulted in nearly 4000 fatalities.

《3.4. Direct economic damage》

3.4. Direct economic damage

Because of the difference in methods used for economic consequence estimation (e.g., gross domestic product (GDP) vs. insurable losses), and the assumption regarding asset value, the results provided ranged from $0.4 billion to $2.6 billion USD.

Results from this Benchmark Workshop showed that, in general, outcomes were similar in terms of hydrodynamics. However, flood-wave arrival times differed between teams. As stated by the formulators, this is attributable to differences in the calculation of the breach hydrograph and in regression and physics-based formulations. The largest differences in peak flood depths were likely due to the requirement for teams using irregular meshes to report output in a structured grid format.

Although PAR estimates were also similar across teams, differences in flood-wave arrival times and flood severity were found. Finally, some significant differences in the economic consequence analyses were found, mainly due to the interpretation of direct impact and the estimation of asset values.

In conclusion, despite the existence in the literature of guidelines and references for LOL and economic consequence estimation (e.g., Refs. [5–7]), due to the wide range of potential methods and definitions, the assumptions made by participants resulted in some significant differences in consequence results. However, despite those differences, it is noted that, in general, outcomes for LOL and economic damages were in the same order of magnitude.

《4. Lausanne 2015: Probability of failure of an embankment dam due to slope instability and overtopping》

4. Lausanne 2015: Probability of failure of an embankment dam due to slope instability and overtopping

The failure of an embankment dam was discussed in Theme C during the 13th Benchmark Workshop, which took place in Lausanne from September 9 to 11, 2015. In this problem, which was inspired by a real case of a dam (located in Spain), although with non-real resistance and hydrological data, the main focus consisted of calculating fragility curves for slope instability and overtopping failure modes and using them to calculate annualized failure probability, accounting for both natural and epistemic uncertainty, as described in Ref. [8]. Three different solutions were presented by Benchmark Workshop participants; these are provided in Refs. [9–11]. In this section, these solutions are compared with the reference solution presented by the problem formulators.

The embankment analyzed is a homogeneous 16 m high earth-fill dam. In recent years, this embankment had small instability problems in the downstream slope, so a quantitative risk analysis was proposed to estimate annual failure probability. Two failure modes were analyzed: overtopping and dam instability. In both failure modes, water pool level was supposed to be the driving force of failure. The problem was divided into five different phases.

《4.1. Analysis of information for the instability failure mode》

4.1. Analysis of information for the instability failure mode

The first step involved the elaboration of a slope instability limit model for the downstream slope of the embankment, and the definition of the main random variables in this model. In this step, very different instability approaches were considered by participants, from simple LEMs to complete FEMs. In addition, two different assumptions were compared for hydraulic conditions: steady-state and transient.

In these models, two random variables were recommended following a Mohr-Coulomb type of failure criteria: friction angle and cohesion. The main statistical parameters of these variables were provided to the participants for natural and epistemic uncertainty.

《4.2. Calculation of reference fragility curves》

4.2. Calculation of reference fragility curves

The reference fragility curve (relation between water pool level and failure probability) for the slope instability was computed using the natural uncertainty distributions of the random variables defined in the previous step. When the fragility curves obtained with these models are compared (Fig. 4), it can be observed that the participants obtained very different results, although all of them used the same random variables with the same distributions and the same geometry. These results show the high influence of the hypothesis made and the model considered in the instability results. The results also show a significant influence of the number of Monte Carlo simulations made.

During the Benchmark Workshop discussion, it was highlighted that the hydraulic hypotheses are especially significant. In this case, considering a steady state could lead to underestimating the slope resistance capacity. In any case, it should be remarked that when numerical models are set up, many small hypotheses are made that can influence the results. For example, two participants with the same software tool, the same geometry, and the same random variables obtained very different fragility curves.

《Fig. 4》

**Fig.4 Comparison of sliding fragility curves computed by participants.**

The reference fragility curve for the overtopping failure mode was directly defined with a log-normal distribution. This fragility curve is a relation between overtopping height and failure probability. All participants concluded that the slope instability failure mode was clearly more significant than overtopping, according to the results.

The reference fragility curves of both failure modes should be combined in order to compute a single reference fragility curve that represents the structural behavior of this embankment for different pool levels.

Common cause adjustment techniques [12] were used to combine both failure modes in all solutions. Using the upper or lower limit for this combination did not have a strong influence, since the slope instability failure mode is clearly predominant and overtopping is only activated for water pool levels with very low probability.

《4.3. Computation of water pool level probabilities》

4.3. Computation of water pool level probabilities

The objective of this phase was to obtain a relation between water pool levels and annual exceedance probability (AEP). This curve was estimated by evaluating flood routing in the reservoir for different flood events and bottom outlet availability situations, based on the data provided regarding floods and the reservoir. All participants obtained similar exceedance probability curves for water pool levels. All of them used the same floods and reservoir data, and flood routing rules were very simple, so flood routing results are very similar in the four solutions.

《4.4. Computing failure probability and sensitivity analysis》

4.4. Computing failure probability and sensitivity analysis

The curve computed in the previous phase, combined with the reference fragility curve, was used to calculate the reference failure probability. The comparison of results obtained for the annual failure probability show high differences, with values ranging from 1.8 × 10^{-1} to 4.1× 10^{-3}. These differences are mainly due to the different fragility curves used, showing that the hypotheses made to analyze this failure mode are clearly conditioning the results. High values were obtained for annual dam failure probability, mainly due to the modification of dam resistance and hydrological data made by the formulators in order to increment conditional failure probabilities, reducing the number of samples and computations needed to characterize them.

《4.5. Assessing epistemic uncertainty》

4.5. Assessing epistemic uncertainty

In this phase, epistemic uncertainty was defined with a probability distribution for the mean of each random variable. These distributions were used to obtain a family of fragility curves for the instability failure mode. The family of fragility curves for the overtopping failure mode was directly defined in the formulation. Finally, the two families of fragility curves were combined to obtain a profile of failure probability. Only two participants conducted this phase, as shown in Fig. 5. There are large differences in these two profiles, in line with the differences in the fragility curves obtained with the instability models. In both solutions, failure probability profiles showed that small variations in the resistance parameter distributions produce important variations in the obtained fragility curves. This result highlights the importance of assessing epistemic uncertainty separately.

The main differences among the contributions are due to differences in the fragility curves introduced in the instability failure mode. The comparison showed that very different results can be obtained depending on the hypotheses made, even using the same geometry and resistance parameters. The hydraulic conditions hypothesis had an especially high influence. Therefore, uncertainty is not only derived from the resistance parameters; other uncertainty sources are the selected slope instability model and the hydraulic behavior of the embankment.

Risk analysis has been demonstrated to be a useful tool to analyze the impact of the hypotheses made. In addition, the obtained results may indicate where uncertainty reduction efforts should be allocated. Therefore, distinction between natural and epistemic uncertainties is fundamental in geotechnical analyses for dam safety management.

《Fig. 5》

**Fig.5 Comparison of failure probability profiles computed by participants.**

《5. Conclusions》

5. Conclusions

New techniques for performing risk assessments are now available and provide information to support decisions for dam maintenance or rehabilitation. Since 2011, ICOLD Committee is addressing this issue from a computational perspective, and is also providing context for researchers and dam managers to understand and pay attention to decisions that are typically made in both risk analyses and in standard design techniques (i.e., frequency of events, factors of safety, breaching parameters, etc.).

Incorporating risk analysis in the last three Benchmark Workshops organized by this Committee has allowed a wide approach to these techniques, covering the three risk components: loads, system response, and consequences. The high participation in all Benchmark Workshops and the number of solutions presented show the interest of the dam community in reliability methods and the application of risk analysis to dam safety. In this sense, Benchmark Workshops have promoted knowledge exchange and discussions.

Recent advances in computational methods have allowed a higher development of risk analysis techniques, evolving from simple structural models to complex numerical procedures and methods. However, there is still room for improvement—for example, in properly addressing epistemic uncertainty when data are collected and risk calculations are made.

Benchmark Workshops have shown that when an engineering problem (even a relatively simple, straightforward, and well-known one) is combined with risk analysis techniques, results should be analyzed in the light of sound engineering judgement in order to obtain meaningful and useful information to inform dam safety management.

Benchmark Workshop outcomes show that when numerical models are applied to a dam safety analysis, hypotheses may modify the results significantly. Therefore, hypotheses should be clearly reported and explained when results are presented. In this sense, risk analysis has been demonstrated to be a useful tool to analyze the impact of the hypotheses that engineers assume in normal design practice, and which are often overlooked.

Suggestions for further research include:

(1) Analyzing the impact of epistemic uncertainty. To better understand the impact on risk outcomes of epistemic uncertainty on load and resistance parameters.

(2) Analyzing risk from a multi-hazard approach. To estimate risk through a comprehensive approach, including all potential hazards (e.g., floods, earthquakes) and existing correlations.

(3) Analyzing correlations among different failure modes. To incorporate existing correlations among failure mechanisms to better characterize risk, and to analyze the impact of different assumptions (e.g., hypothesis used for common cause adjustment).

(4) Including evacuation and human behavior on consequence estimation. To better characterize potential consequences in case of dam failure and flooding due to uncontrolled releases, including uncertainty on warning and evacuation effectiveness.

《Acknowledgements》

Acknowledgements

This paper was published with the support of the research project INICIA (Methodology for Assessing Investments on Water Cycle Infrastructures informed on Risk and Energy Efficiency Indicators, BIA2013-48157-C2-1-R, 2014-2016) funded by the Spanish Ministerio de Economia y Competitividad (Programa Estatal de Investigación, Desarrollo e Innovación Orientada a los Retos de la Sociedad).

Ignacio Escuder-Bueno, Guido Mazzà, Adrián Morales-Torres, and Jesica T. Castillo-Rodríguez declare that they have no conflict of interest or financial conflicts to disclose.