1. Introduction
Data has rapidly become the backbone of both the market outlook and societal analysis within our time. As a result, modern companies and governments are capitalizing on this valuable asset to create entrepreneurial opportunities and promote regulation development. The optimized utilization of this new resource is attributed to the development of data-driven analysis tools such as data-driven artificial intelligence (AI), which has particularly proved its value through implementation within a myriad of different applications. At the heart of the digital revolution, the value of data-driven AI can be seen through its continuously increasing market avenues. As forecasted by the International Data Corporation (IDC), AI-related worldwide revenue was anticipated to experience an 18.8% growth in 2022 from 341.8 billion USD in 2021, and it is on track to break the 500 billion USD mark by 2024 [
1,
2]. Although
of the current AI market consists of the development of AI software, the market for AI-assisted services is rapidly catching up and is expected to overtake software in the leading AI market role from 2023 onwards [
2]. Furthermore, in a report by PricewaterhouseCoopers (PwC) titled "AI Predictions
2021," 86% of over 1000 executives at US companies claimed that AI would be a "mainstream technology" within their company by 2021 [
3,
4]. These directly claimed interests from executives and CEOs-coupled with the glaring amount of economic benefits-give a strong indication of the inevitable AI transformation-driven changes that are taking place in almost every possible aspect of how the world operates.
Despite AI transformations occurring everywhere, the opacity issue of AI applications partially suppresses their extension to critical domains, particularly those involving safety issues and huge profits. As a result, the revolution brought about by AI utilization is currently distinguishably louder and faster in some particular fields compared with sensitive domains. For example, people are gradually coming to accept options that have been prescreened and customized by AI, such as tailored advertisements on websites or recommended videos on YouTube. Such applications involve no life-or-death decisions; therefore, recommendations from an AI decision-maker can be readily endorsed. However, in sensitive areas such as clinical applications and chemical engineering (CE) processes, the choices being made are related to health, safety, and tremendous economic value; therefore, the adjudicatures of AI cannot be easily accepted due to a lack of insight into why AI is generating these decisions. As a result, the field of AI as a whole is facing a transparency issue wherein the predictions from AI algorithms are not justifiable, and the rationale behind the decision-making processes is hidden. This is most notable within the currently dominating AI algorithms (e.g., artificial neural networks (ANNs) and deep learning), whose black-box nature further increases the opacity within this field. These models have been gradually taking the lead in AI applications due to their ability to extract hierarchical data representations for accurate predictions of sophisticated systems. However, the tradeoff of using these AI methodologies to acquire ameliorated predictions in highly complex systems is the loss of transparency in decision-making processes [
5]. The rising problem of opacity has become a substantial obstacle in the transformative utilization of AI decision-makers in critical domains [
6⇓⇓⇓⇓-
11]. The lack of AI transparency raises particular concern in the fields of healthcare and safety due to the potential biases and liability issues that stem from the ethical and regulatory aspects of blindly accepting black-box decisions [
6]. Nevertheless, momentous interest remains in developing AI decision-makers that can be safely and conscientiously implemented into these kinds of sensitive and vital domains. This interest has led to the exploration and development of explainable AI (XAI) techniques: AI algorithms that can explain and interpret how and why they make their decisions [
12]. These techniques have become an immediate necessity for the proliferation of AI within pivotal and crucial areas involving safety and high profitability. These aforementioned critical areas greatly impact our society in profuse different functionalities ranging from clinical operations to energy generation and from medical designs to
, which serves as the particular focus of this study.
The field of CE spans an extensive scope of vital industries, including petrochemicals, manufacturing, biotechnology, food processing, specialty chemicals, pharmaceuticals, and more. Born from the fusion of mechanical engineering with applied chemistry,
has always served a heavy duty within industrial practice to satisfy the ever-increasing human livelihood requirements [
13]. Accordingly, it is expected that AI transformative utilizations within CE will make significant improvements in our overall living conditions in a vast number of different aspects of CE processes, including design, estimation, control, and monitoring [
14]. As CE is an essential domain, it is imperative for the applied AI algorithms in CE to reach high transparency in order to provide reliable predictions that can raise the trust of chemical engineers with a strong sense of responsibility. However, with respect to the explanation and interpretation capabilities of AI decision-makers, the question remains whether the data-driven predictions from AI algorithms are adequate to fulfill the need for transparency in the field of CE. All CE processes are concretely governed by physical and chemical laws that should be considered in both the development of an AI transformation method and the post-hoc analysis to reveal the mechanisms controlling the system behaviour. It is thus necessary to merge prior and posterior distributions of physical and chemical laws into AI transformations in order to provide sufficient contributions from the aspect of scientific understanding of CE applications. However, very little special attention has been paid to the transparency of AI transformations within the CE literature. Historically, no special considerations were even given to AI development within CE until 2005 [
15], due to the difficulties in the implementation of sophisticated CE systems. This issue was somewhat remedied following the development of the deep learning method [
16], which effectively stimulated AI utilizations in CE [
15,
17]. This currently dominating machine learning (ML) algorithm enables computers to hierarchically capture and extract data representations and their patterns for convoluted predictions or classifications. However, the "black box" nature of these AI transformation "boosters" requires AI applications in CE to sacrifice transparency in order to adapt to the complexity of the considered systems in this area. In other words, AI utilizations in CE inherently suffer from opacity challenges that limit their applicability within critical CE industrial processes.
Although AI has had a long history of development since it was first proposed by Alan Turing over 70 years ago [
18], it is still a relatively new concept to chemical engineers. Consequently, there are various questions that must be dispelled in order to further enhance AI implementation within CE applications. To keep up or even lead the way toward the inevitable AI transformation revolution that is penetrating every aspect of our lives, it is our responsibility as chemical engineers to maintain a clear awareness of what is required of AI applications, particularly within our field of study, in order to avoid the potential stagnation and bottleneck constraints caused by the opacity of AI algorithms. Motivated by the above, this study aims to provide an overview of the current state of transparency in AI applications within the critical domain of CE. Furthermore, this review introduces the essential aspect of transparency to AI transformations within CE in order to fulfill the imperative need for trust and scientific knowledge within this sensitive domain. It should be noted that no comprehensive analyses of AI applications have been previously conducted to elaborate upon the synergetic effects of AI transparency features. This work will thoroughly elaborate on the transparency of
applications in CE, with a particular focus on the theory-guided aspects of this field, in order to enhance the reliability of these transformative utilizations. Subsequently, this work particularizes the state-of-the-art progress of AI transparency in CE with emphasis on the causality, explainability, and informativeness of these applications. This report subsequently utilizes a thin-film growth case study to provide a comprehensive exploration of AI transparency and to further accentuate the implementation of reliable AI utilizations that can fulfill the demands of CE. Potential research prospects are then discussed to inspire future investigations.
The organization of this paper proceeds as follows. Section 2 presents the identification and detailed interpretation of transparency in AI utilization within CE. Section 3 comprehensively reviews the state-of-the-art progress in three vital aspects of transparency within AI applications in CE-namely, causality, explainability, and informativeness. In Section 4, a systematic case study of an AI transparency analysis is provided in an AI-enhanced multiscale model of thin-film growth. Future perspectives are discussed in Section 5, followed by concluding remarks highlighting the contribution of this review paper in Section 6.
2. From XAI to AI utilization transparency in CE
The nature of work in engineering highlights the value of the sense of responsibility held by practitioners in this area. Accordingly, chemical engineers, who usually bear the burden of ensuring economic benefits and life security in their work, are inherently prudent in accepting non-transparent decision-makers (i.e., modelling algorithms) and non-justifiable predictions. In other words, the transparency of AI transformations in CE determines how far and how fast these utilizations can bloom in this sensitive domain. Accordingly, XAI is a critical field of emerging AI transformative applications within CE. The original trials of interpreting the decisions made by AI algorithms were arguably conducted by Vaidyanathan and Venkatasubramanian [
19] and Swartout and Moore [
20] in 1992 and 1993, respectively, and were later deferred due to the focus of AI improvements dramatically shifting to their predictive power [
21]. The significant progress of AI development in recent years has led to its extensive application in almost every possible aspect, raising issues regarding the reliability of AI transformations.
Fig. 1 displays an analysis of the number of publications related to XAI over the last 31 years on the Web of Science. The keywords "explainable AI" and "explainable artificial intelligence" were used to obtain the presented results, which are displayed according to their year of publication. The shown dramatic increase in the number of studies on XAI-particularly within the last seven years (taking up roughly
of the total)- indicates the recent revival of this topic and the necessity to explore the explainability of AI algorithms.
Given the above, what exactly is XAI, and how can it be implemented in CE applications? The general core of this concept refers to transparent AI applications that make their functioning explainable and provide human-understandable justifications for their outputs [
5,
12,
22⇓-
24]. However, the concept of XAI swivels between different areas, among diverse audiences, and for various initiatives, since XAI-as a branch of AI-shares similar broad audiences, including computer scientists, medical investigators, social scientists, engineers, and so forth. In particular, the transparency of XAI models is a two-sided estimation that considers both "the ability to explain or to present in understandable terms to a human" [
25] and "the degree to which a human can understand the cause of a decision" [
26]. Accordingly, explainability and interpretability are two related yet subjectively different cornerstone features of XAI. Explainability denotes the active aspect of an AI model in clarifying its internal functions, while interpretability refers to the passive aspect of an AI-that is, how well humans are able to comprehend the connections between the input and the predictions [
5,
21,
27]. The transparency of AI transformative utilization within CE is generally rooted in XAI, but it must also align with the characteristics and particular requirements of the field itself. It is not the intent of this work to confound potential practitioners with the debate regarding disunified terminologies. Therefore, this work will specifically elaborate on the transparency of AI utilizations with self-defined terminologies within the area of CE.
The three vital factors that significantly impact the transparency of AI transformations in CE are displayed in
Fig. 2, which shows that the evaluation of the transparency of AI utilization can be elaborated upon from the aspects of causality, informativeness, and explainability.
Fig. 3 also displays the vital aspects of transparency and their corresponding improvement strategies. Here, causality (also referred to as justification) closely links AI predictions to the input features. Accordingly, the predictions of the AI algorithms can be justified by the impacts of their input features, thereby producing understandable explanations of the system behaviour to humans. Informativeness implies the ability of AI applications to expatiate CE processes governed by well-known chemical and physical laws in order to advance human understanding of the studied systems. The informative aspect of AI also indicates a model’s capability of merging well-known governing laws into its algorithms (i.e., physics-informed or theory-guided AI) in order to obtain physically consistent predictions and improve mechanistic comprehension of the phenomena. Finally, explainability represents the transparency of the decision-making processes, depending on how understandable the rationale of the computational flow is-for example, how multilayer structures are operated to improve the ability of a neural network (NN) model to capture the highly nonlinear behaviour of a given system. Thus, XAI transformations within CE should be highlighted by their causality, informativeness, and explainability. More specifically, a transparent AI application can disclose the predictions impacted by the required input features, establish a comprehensive understanding of the phenomenon on the basis of physical and chemical laws, and reveal the rationale for the decision-making process. Enhanced transparency of both the modelling process and the ensuing predictions increases the reliability of AI transformation and ultimately reduces the doubts of chemical engineers. The currently emerging transparency investigations regarding causality, informativeness, and explainability are anticipated to severely affect the extensive AI transformation within CE.
AI applications can be improved with respect to both algorithm transparency and extractive transparency-that is, enhanced transparency from both a priori and a posteriori analyses, as presented in
Fig. 3. Algorithm transparency refers to how explainable the modelling process itself is. From this perspective, some AI tools are inherently interpretable, such as the linear related classification method, logistic regression, and decision trees, the latter of which precisely processes a cognitive task in a tree-like structure naturally mimicking the human reasoning process. In regard to the intuitive interpretability of AI paradigms, symbolic AI-that is, the classical AI of the 1960s-1980s that dominated the expert era through symbolic knowledge and model inference-made significant contributions to AI development with inherent transparency [
8,
28]. Unlike trendy data-driven AI, symbolic AI can easily explain its logic and reasons by showing which parts of the decision-making processes are evaluated as true or false. Although symbolic AI has not contributed to the current excitement regarding AI development nearly as much as data-driven algorithms (for multiple reasons, including implementation challenges in practical applications), the concept of a hybrid scenario combining symbolic AI and data-driven AI is emerging with the increasing requirements for transparent AI utilization and will be further elaborated on in Section 5 [
8,
29]. Informativeness also contributes to the overall algorithm transparency. In order to improve the informativeness of AI models, the integration of prior information, such as first-principles knowledge, with ML can stimulate a convergence toward physically consistent predictions. This modelling strategy, which is usually referred to as "physics-informed" or "physics-con strained," has been demonstrated to enhance the transparency of AI utilization in a manner that is especially suitable for expensive data occasions [
30,
31]. In particular, hybrid modelling-that is, combining first-principles-based models with data-driven algorithms-can contribute to the revelation of the process mechanisms governed by classic chemical and physical laws, thereby improving the informativeness. Since most CE processes have intrinsic multiscale characteristics, a specific CE phenomenon can be illustrated through an analysis of the electron distribution on the microscopic surface of atoms [
32] but can also be explained in terms of macroscopic hydrodynamic behaviour [
33]. In hybrid modelling, a multiscale theoretical hybridization strategy can provide a more comprehensive understanding of a system compared with a single-scale theoretical hybridization strategy and can therefore further enhance the informativeness of AI utilizations. However, most current AI applications-such as popular deep learning utilizations-are highly opaque, sacrificing their transparency in order to predict sophisticated system behaviour. Accordingly, extractive transparency can be explored before and after the modelling process, the pre-modelling data processing, and the post-hoc explainability [
5,
21,
27]. Pre-modelling data processing initially analyses the modelling data to identify the required features for the training and organizes these features into target structures for utilization in algorithms. Post-hoc explainability, on the other hand, relates to efforts to explore the understandability of the performed algorithms via a taxonomy of analysis methods such as visualization (e.g., layer-wise relevance propagation, LRP [
34]), simplification (e.g., G-REX [
35]), feature relevance (e.g., Shapley additive explanations, SHAP [
36]), and ensemble learning. In order to thoroughly interpret
transformations in CE, both the algorithm transparency and the extractive transparency should be considered and evaluated using related technologies and their assemblies.
3. State-of-the-art transparency-considering AI applications in CE
Based on the discussions above, it is apparent that transparent AI utilizations would rapidly promote the progress of the AI revolution by establishing trust in the decision-making algorithms among chemical engineers. Thus, transparency exploration in AI transformations is the vital missing link in developing a reliable decision-making algorithm for critical CE applications. However, the implementation of AI studies within CE is still in its infancy compared with other fields; consequently, there has been very little in the way of AI transparency studies within CE. To effectively spur the corresponding development of this subject matter, this section elaborates upon related state-of-the-art transparency-considering AI applications in CE that analyze transparency in terms of causality, explainability, and informativeness.
3.1. Causality of AI transformative utilization in CE
Among the limited number of transparency-considering AI investigations in CE, the causality of an AI system (i.e., the emerging relationships between a system’s input features and the AI predictions) has increasingly become one of the most prominent areas of study within this field. Although causality-directed AI explorations have only emerged very recently within CE, research in this field has still received some of the greatest attention in CE compared with other aspects of AI transparency [
9,
37]. Most notably, causality within AI-related CE research has been particularly investigated for systems related to process optimization, control, and monitoring, as the decision-making algorithms in these areas are closely associated with high profitability and safety.
Against this background, Agarwal et al. [
38] published an analysis on the causality of an AI algorithm in a CE industrial process that evaluates the classification observability from the input-output data in a straightforward manner while considering the measurement noise in the input data. The observability of the classes indicated the relevance of the inputs for further pruning of the network. Building upon this work, Agarwal et al. [
39] further proposed the sequential layer-wise relevance propagation for pruning (SLRPFP) algorithm, which identifies the most highly impactful system inputs and eliminates any non-contributing features within an NN. The researchers subsequently implemented the developed SLRPFP algorithm in two industrial case studies involving the Tennessee Eastman process (TEP) [
40,
41] and a vaccine manufacturing process. Their work demonstrated that the proposed NN pruning algorithm can effectively avoid data over-fitting issues and can furthermore enhance the computation efficiency of classification models. In addition to contributing toward the causality of these deep learning models, this algorithm contributes to the informativeness of the system through its evaluation of which inputs significantly contribute to the system behaviour. This informative aspect can subsequently be used to estimate both the positive and negative effects of the studied inputs on the profit function.
In another study, Agarwal et al. [
23] adopted an LRP method for feature relevance analysis in order to provide causality information for fault detection and diagnosis (FDD) in statistical process control (SPC). Their work implemented the LRP method in their newly proposed FDD algorithm based on a deep supervised autoencoder and a dynamic supervised autoencoder, respectively. The addition of this method suppressed the over-fitting issue present in networks trained on noisy data by iteratively trimming the redundant inputs based on the obtained relevancies from the LRP analysis. These researchers’ work revealed that the possible causes of the detected faults could be determined during this process and that the inclusion of LRP improved the FDD test accuracy when compared with previously reported FDD methods.
Aside from feature relevance explainability exploration in FDD, Hale et al. [
37] developed inferential sensors based on symbolic regression in order to reduce the evidence of noise and uncertainty in the measured data and improve the robust fault diagnostic process implemented in the heat exchanger fault diagnostic application. Compared with traditional classification methods (e.g., support vector machines (SVMs) and NN methods), utilizing the developed inferential sensors led to greater algorithm transparency due to the algorithm’s easily explainable mathematical forms. Thus, these sensors show promise for easier implementation in other systems, compared with traditional AI classification methods. It should be noted that the quality of the classifications reported using the inferential sensors was hardly impacted when compared with those of traditional classification methods, despite the heavily reduced complexity of the sensor-based algorithm.
Based on offline FDD analyses, online process monitoring using AI methods also requires greater algorithm transparency. To address this issue, Bhakte et al. [
9] proposed a method to improve the transparency of deep neural networks (DNNs) through the use of SHAP values for variable attribution analysis. These values, which provide a metric to quantify the contributions of system inputs to an AI model’s predictions, were implemented into both a simple linear example and the TEP, as well as an online monitoring scenario. The results from this Shapley-based method highlighted which of the numerous measured variables were the most related to the desired outputs, thereby showcasing the method’s ability to provide accurate and efficient sample-specific explanations for input-output correlations. Most importantly, this method can provide onsite explanations for understanding real-time operational strategies based on DNN methods.
In addition to FDD analysis, reaction path investigation is another field that has attracted special attention from researchers adopting AI transformative utilization, due to the intensive computing requirements of traditional reaction path methods based on analyzing electronic distributions, such as density functional theory (DFT). Data-driven AI applications have been shown to revolutionarily improve computing efficiencies while adequately maintaining prediction accuracy [
42,
43]. However, the nontransparent nature of these data-driven AI analyses limits their applications in critical manufacturing processes-particularly those involving sensitive reactions. To address this issue, Kim et al. [
44] used LRP analysis to interpret the formation of hazardous reaction products predicted by an NN model in a binary decision task. Furthermore, Kikutsuji et al. [
45] conducted local interpretable model-agnostic explanation (LIME) and SHAP analyses to distinguish the contributions of the studied features in DNNs in order to explain the reaction coordinates of alanine dipeptide isomerization. In both studies, the implementation of causality-driven AI techniques provided further insights into the decision-making processes of the AI models, thereby improving the models’ overall justifiability for reaction path investigation applications.
Another AI algorithm implementation field that has been critically limited by the lack of transparency pertains to material development in pharmaceuticals for further clinical adoption [
46,
47]. Particularly within medical applications, it is necessary to have an understanding of the causality of AI implementations from two different perspectives. First of all, it is imperative to have a general understanding of the causal relations between the considered input features and the output predictions. Furthermore, it is necessary to comprehend the human causal understanding, which pertains to the ability of healthcare professionals to understand the relationships of the various factors that influence human health, diseases, and medical outcomes [
3,
48]. For example, in a case study of histopathology via deep learning analysis and an experienced pathologist, Holzinger et al. [
3] emphasized the causality required to achieve "a level of explainable medicine," which featured peer-to-peer explanations to humans to enable their causal understanding. Ward et al. [
49] quantified the impacts of the considered drugs on the adverse outcome of acute coronary syndrome (ACS). Their work adopted SHAP and LIME analyses in tree-based models in order to highlight the potential of causality evaluations to accelerate the development of a real-time pharmacovigilance monitoring system. Furthermore, Yang et al. [
50] established an efficient 27- layer convolutional NN-a multiscale graph NN (MGraphDTA) based on chemical intuition-for drug-target affinity (DTA) prediction. Their developed model was subsequently used to facilitate drug development, coupled with visual interpretation established by a newly proposed gradient-weighted affinity activation mapping (Grad-AAM) method that labeled the most impactful molecules on DTA. Each of these works emphasized the importance of incorporating AI causality in order to improve the trustworthiness of AI techniques for medical applications. As shown above, causality exploration starts from systems involving critical concerns such as safety and medical utilization and facilitates the application of AI in these fields by justifying AI models’ predictions via feature-oriented explanations.
3.2. Explainability of AI transformative utilization in CE
Regarding the explainable rationales behind the AI algorithms themselves (i.e., their explainability), we know of very few studies in CE that have contributed to this area. The reason behind the scarcity of the rationale exploration of AI algorithms in CE is intuitively obvious: The sophistication of CE systems necessitates the use of intricately structured AI algorithms, such as DNNs with highly hierarchical structures (e.g., multiple layers and modified units). As a result, it is intractably difficult to determine the rationales governing the prediction skills of these AI decision-makers. However, the unclear internal mechanisms of these AI algorithms in CE cause reliability issues when the algorithms are applied to critical applications, as discussed in previous sections. Furthermore, the lack of generalizability of these models renders it difficult to avoid possible bias and adversarial examples.
The very first study investigating the internal mechanisms of NNs in CE, to the best of our knowledge, was published in 1992 by Dr. Venkatasubramanian and coworker [
19]. This work, which specifically focused on process fault diagnosis in a continuous stirred tank reactor, evaluated the effects on the fault space structures and fault classification performance caused by the number of hidden neurons and the input units in an NN. The increase in the number of hidden neurons can facilitate the separation of the fault space into a greater number of decision regions, improving the ability to capture complicated system information. Single-fault generalization can also benefit from an increased number of hidden units in the NN. Moreover, two-fault generalization can only be improved by an increasing number of input units to deliver more comprehensive information about the diagnosed system to the algorithm. In order to explore this subject further, Dr. Venkatasubramanian’s team also conducted the most representative analyses of the rationale investigation of DNNs implemented in CE processes in both classification problems and regression problems in the literature [
51,
52]. These studies generally demonstrate that complex distributions require more sophisticated NN topologies. In particular, in both classification and regression tasks, deeper networks are capable of transforming the input space into more intricate patterns to simulate relationships of greater complexity and map more intricate distributions. On the other hand, wider networks-that is, networks that have more neurons within a single layer-can combine more simple traits to capture complicated emerging system patterns. Regarding classification problems, an increase in the width of NNs can generally identify a greater number of simple characteristics, thus increasing the generalizability of the NNs to be applied to numerous other tasks. According to the degeneracy analysis of the final layer parameters, countless sets of weights can result in the same accuracy of classification predictions. These scholars’ studies also showed that, despite the structures of the NNs, the first-layer nodes will lead to a linear activation of the input space that is successively curved in the later layers. Furthermore, they concluded that similar geometric objects can be required by two separate NNs to perform related but different tasks. This means that the pretrained NN structure can be considered as a reference to initialize the training of another network for a related requirement, which is referred to as transfer learning in the literature[
53,
54]. Moreover, Dr. Venkatasubramanian’s team demonstrated that an increase in the depth of a classification NN is not necessarily beneficial to capturing the actual distributions of data with a possible arbitrary active pattern, and can instead cause adversarial examples that lead to an incorrect diagnosis. The same is true for regression problems: Without a certain width, a deep NN cannot adequately map the input to the output while incurring a sufficiently low loss, especially when the system exhibits a highly nonlinear relation [
49]. In other words, the analysis of an NN’s workflow indicates that the width of the NN is critical to establishing a high-performance NN. It should be noted that an increase in both the depth and width of the NNs results in sophisticated loss function landscapes, causing difficulties in simulating the regression issues. Moreover, regulation has been shown to be able to reduce over-fitting problems by simplifying the loss of landscapes. In general, the optimal structures of an NN are mainly determined by the generated data and the tasks themselves. Accordingly, for specific CE problems, an optimized performance still requires customized NN structures, which can initialize training based on the discussed impacts of the NN structures or based on pretrained NNs for related tasks.
3.3. Informativeness exploration in CE: Physics-informed and hybrid modelling
Informativeness represents the scientific perspective of engineers, which pursues the understanding of a system from the first principles of the investigated phenomena as a priori or a posteriori knowledge. However, informativeness also presents a tradeoff with the high performance of the currently dominant AI methods in practical AI utilizations (e.g., advertisements and computer vision). To capture the extremely nonlinear system behaviour, these algorithms primarily rely on massive training datasets with noise, as well as deeply hierarchically and heavily modified operational structures. The mathematical form of these algorithms-and, to a greater extent, the first-principles knowledge-is seldom available or is exceedingly intensive to simulate. Nevertheless, the habit of learning or interpreting systems in mathematical logic is almost an engineer’s instinct, developed over the course of engineering training and performing processes for professional operations. This scientific "habit" ensures meticulous planning and general comprehension of the manufacturing processes involved; thus, benefit maximization is achieved while unexpected danger is avoided. Accordingly, the informativeness of AI transformations in CE, achieved by integrating the a priori first-principles knowledge and extracting mechanistic insights into the systems, is a vital factor that requires particular attention. To this end, two concepts, ① physics-informed learning and ② hybrid modelling, have raised a certain amount of interest, especially among related practitioners.
Physics-informed learning integrates data and first-principles knowledge ranging from physical laws to empirical, observational mathematical expressions of the phenomena in order to improve ML algorithms [
55]. By enforcing the use of a priori knowledge in AI models, physics-informed learning processes enforce the theoretical constraints, induced biases, and multi-fidelity observational data patterns, allowing these algorithms to leverage both our prior understanding of the system and observational data patterns to optimize model performance. As a result, these algorithms maintain the key features of analyzing multi-fidelity (and possibly imperfect) data and directing high-dimensional correlations from data-driven AI algorithms, while simultaneously gaining the ability to provide robust, understandable, and physically consistent predictions even in exploration/generalization tasks by exploiting the embedded physical constraints [
55⇓⇓⇓⇓-
60]. The conceptually simplest integration method to enforce physics in ML is to utilize observational biases that are introduced from data reflecting the underlying physical principles [
61,
62]. It should be noted that observational biases can only be reinforced with an adequate number of training sets, which may be prohibitive to obtain in engineering applications. A more practical approach is to introduce inductive biases by tailoring NN structures that incorporate prior knowledge for an explicit cognitive task, such as the well-known convolutional NNs, graph NNs, or Gaussian processes [
63⇓-
65]. However, inductive biases cannot be achieved unless explicit physics and symmetry groups are available with elaborate implementations [
55]. Unlike inductive biases, learning biases are generated by softly reinforcing constraints based on a priori understanding of the system through properly penalizing the typical loss function approximations leading convergence toward physically consistent solutions. By means of soft penalty constraints, a partial comprehension of the underlying invariances is incorporated into the model; thus, the implementation of learning bias enforcement tolerates a broad spectrum of first-principles understandings in various domains [
55,
66,
67]. Particularly in CE, the complexity of the studied systems makes it exceptionally challenging to obtain a thorough understanding of the phenomena. The high flexibility and expressive ability of learning-bias-integrated ML algorithms accordingly facilitate numerous studies in CE with the implementation of partial understandings as first-principles constraints, as is the case within the prominent physics-informed neural networks (PINNs). PINNs are especially prevalent in CE as the dominant physics-informed learning technique. Therefore, a further discussion on PINNs is included below.
PINNs are supervised NNs that incorporate the laws of physics in the form of nonlinear partial differential equations (PDEs), including integer-order PDEs, fractional PDEs [
68], stochastic PDEs [
69], and so forth [
70⇓⇓-
73]. These physics-driven AI systems have rapidly gained attention as NNs that respect any symmetries, invariances, and conservation principles of physics through the integration of time-dependent and nonlinear PDEs. In these approaches, PDEs are directly incorporated into the loss functions of the NNs as a residual term. The network is then trained to minimize the PDE-based residual over a range of values for each input, so that the PINN can sufficiently predict the PDE output for a given set of inputs. PINNs were first proposed by Karniadakis’s group [
74] in 2017; the researchers exploited automatic differentiation to represent all of the differential operators, which served as the first trial leading to the recent boost in physics-driven investigations. These early works originally treated PINNs as a complementary method for solving ill-posed fluid problems in which the boundary conditions and/or the initial conditions of the resulting PDEs are unknown [
73⇓-
75]. Instead, these works achieved PDE solutions through a mesh-free PINN technique that optimizes the loss function of the NN based on the PDE instead of solving the equations directly. Accordingly, PINNs are an effective strategy for establishing an efficient surrogate algorithm for computationally expensive calculations of general nonlinear PDEs. Governed by classic physical constraints such as the Navier-Stokes equations (NSEs), studied systems in CE that suffer from the complexity caused by high-dimensional functions, particularly hydrodynamics, are widely investigated using PINNs. It is well known that investigating hydrodynamics in CE processes via computational fluid dynamics (CFD) is both time and computational-resource consuming. Thus, high-dimensional problems or inverse flow problems are intuitively sophisticated and prohibitively expensive to solve via CFD. Instead, well-trained PINNs can outperform CFD solvers in terms of both accuracy and efficiency, so long as spatiotemporal data is available for both the forward- and inverse-flow ill-posed problems [
76]. The flexibility of PINNs in the form of PDE constraints has stimulated PINN applications in a wide range of flow issues, including two-/three-dimensional (2D/3D) flows [
73,
77], compressible and incompressible flows [
56,
76], and flow transitions [
78]. Although PINN investigations generally focus on flow dynamics, they can be similarly adapted for other applications in CE, such as stiff chemical kinetic issues [
30,
79], fixed-bed reactors [
80], predictive control in consideration of corrupted data [
81], flow velocity measurement [
82], and material development [
83⇓-
85]. With regard to ill-posed or inverse problems, PINNs provide more accurate predictions compared with traditional numerical methods, while traditional methods deliver better performance with forward and well-posed issues [
55].
The extensive applications of PINNs facilitate the development of software libraries of physics-informed ML. Due to its status as the dominant coding language of ML, most of these libraries are built using Python (e.g., DeepXDE, PyDEns, SimNet) with two exceptions (Neural PDE and ADCME), which are written in Julia. Among them, DeepXDE is a solver that only requires the initial definition of the issues to proceed to the solutions and can handle all computational details automatically. Complex domain geometries can be considered using DeepXDE based on constructive solid geometry. In particular, DeepXDE is highly flexible due to its loosely coupled structure; therefore, it is both an educational and a research tool that can rapidly contribute to the scientific development of ML [
70].
In PINNs, the model performance is improved via the hybridization of first-principles-based PDEs and the data-driven NN. From this perspective, the motivation of combining first-principles-based models with data-driven paradigms leads to another appealing informative AI utilization that is particularly prevalent among engineers-hybrid modelling. By integrating mechanistic models with AI methods, the physical understanding of a system can be applied to benefit the algorithms used to model the system in terms of accuracy and robustness [
86,
87]. The correlations revealed using AI systems with the assistance of mechanistic models can further elucidate the fundamental mechanisms of the investigated phenomena. It is notable that, considering the broad definition of physics-informed learning that enforces first-principles knowledge into data-driven algorithms, hybrid modelling can be considered a physics-informed scenario. However, in the current literature, physics-informed learning generally refers to enforcing physical/chemical understandings within data-driven methods (e.g., enforcing physiochemical equations within the loss function, as is the case for PINNs). On the other hand, in hybrid modelling, the theoretical models and data-driven algorithms can proceed with their calculations both dependently and independently, while maintaining their connections in a parallel or serial manner. Furthermore, the hybridization of data-driven AI methods and theoretical models brings about further advantages in addition to informativeness, such as efficiency improvement, which is another vital factor that attracts much attention to this field. These advantageous properties, along with the currently high interest in this field, can potentially leverage the prosperity of AI utilizations in CE, where they have thus far been deferred. These expedient features will be further discussed below.
Regarding the development of hybrid modelling, the idea of integrating mechanistic models with data-driven algorithms can be traced back to the early 1990s. These early studies aimed at exploiting all available information on studied systems (i.e., mechanistic knowledge, heuristic knowledge, and data with underlying patterns collected from the processes) in a manner that synergically contributed to model establishment for high-quality predictions [
32,
88⇓⇓⇓-
92]. Over the course of 30 years of extensive efforts in hybrid modelling and through its incorporation in a myriad of different applications, the definition of what constitutes a hybrid model has become debatable and ambiguous [
14,
93]. However, in the context of developing informative AI transformations, hybrid modelling is defined within this work to refer to hybrid algorithms that combine theoretical mathematical paradigms with data-driven AI methods. Some researchers designate the former theoretical expressions as "white boxes," which build mathematical relations from prior first-principles knowledge [
86,
87]. The latter AI algorithms are accordingly referred to as "black-box models," since they establish the correlations between the model outputs and their inputs in a manner that is hidden to the observer [
94,
95]. In this context, hybrid models can also be referred to as "gray-box models," due to their integration of both white-box and black-box models. Physics-informed AI models theoretically belong to the category of gray-box models as well. However, due to their broad interests and conceptual utilization, physics-informed models were previously distinguishably reviewed within this work prior to the present conversation.
As mentioned in the sections above, hybrid modelling can be employed to strategically implement informative AI transformations. Integrating prior physical knowledge of the systems naturally enhances the algorithm’s ability to be understood by a human. The operational processes and predictions of the theoretical models within the hybrid algorithms can partially or completely reveal the physical mechanisms of the studied phenomena. Accordingly, the resulting decisions are easier to interpret; thus, hybrid models tend to be more reliable compared with standard AI methods [
96]. In addition, a strong hybridization of theoretical and data-driven models can offer direct benefits far beyond informativeness. Although the digital revolution has adequately increased the quantity of available datasets in engineering, CE is still not a "big data" domain, due to its relatively higher expense of data collection compared with other fields of study, such as linguistics [
15]. In particular, datasets collected from industrial operations or lab work are generally obtained for systems operating at steady state under normal operating conditions. In other words, these large volumes of datasets collected over a long time span are limited in their variability and thus contain low information content [
93]. The inclusion of theoretical modelling can minimize this inherent weakness of AI utilizations in CE by providing substantial low-cost data under a broad range of operating conditions. Consequently, hybrid modelling algorithms are more competent in performing extrapolation and can thus more readily avoid over-fitting [
97]. On the other hand, standard theoretical models normally mandate a strong comprehension of a system, including its reaction kinetics, surface revolution, transport phenomena, and so forth. In contrast, the addition of AI methods within hybrid algorithms allows these models to be implemented without a complete understanding of the system. Therefore, hybrid algorithms are easier to implement than standard theoretical models for non-ideal systems for which knowledge of the system behaviour is incomplete. In addition, hybrid models allow for the incorporation of unknown system-oriented parameters correlated into the algorithms to correct the errors of purely theoretical predictions. Furthermore, by replacing computationally intensive calculations with AI algorithms, the efficiencies of hybrid models can be revolutionarily improved [
98,
99]. AI can also bring high-dimensional features into consideration to detect their synergic impacts in a manner that is much more straightforward than theoretical methods [
100,
101]. Hybrid modelling is thus prevalent in investigations of CE processes within a non-ideal environment with high complexity.
Hybrid models can be concisely constructed in a serial or parallel order, as presented in
Fig. 4. In a serial hybrid model, the data-driven AI models and mechanistic models are both operated sequentially, as illustrated in
Fig. 4(a). Accordingly, the outputs of the first model can operate as the inputs to the next model, in a serial fashion. These serial structured scenarios are most helpful where partial comprehension of the studied system is accessible, which is the case in almost all investigated CE processes. Furthermore, a serial-structured hybrid modelling scenario makes it possible to simulate the inherently sophisticated phenomena of a system without the need to sacrifice the prediction accuracy. As shown in
Fig. 4(a), a serial hybrid model can be initiated starting with either the mechanistic module or the data-driven module, depending on the deficiencies of the models, to achieve the objective outputs. These methods are typically referred to as the mechanistic module/data-driven module (MM/DM) and the data-driven module/mechanistic module (DM/MM) serial methods, respectively. It is notable that, in both sequential methods, the data-driven module can provide unknown mathematical representations and process parameters in convoluted environments, thereby substituting for the need for computationally intensive theoretical models and bringing revolutionary efficiency improvement. Meanwhile, phenomenological models undertake the challenge of improving the informativeness of the scenarios [
99,
102⇓⇓⇓⇓-
107]. The MM/DM scenario, in particular, compensates for the expensive data collection in CE. The prior mechanistic module can provide training data under a wide scope of operating conditions, including harsh environments. Endowed with a dataset under harsh conditions in the training of data-driven modules, the hybrid model can effectively diagnose dangerous operating conditions to ensure processing safety. Furthermore, the data-driven and mechanistic models can be operated in parallel and subsequently combined, as illustrated in
Fig. 4(b). These parallel scenarios emerge as attractive when the accessible phenomenological mathematical expressions are confined by their prediction power [
93]. This confinement can be attributed to the theoretical models’ limitations in replicating certain phenomena, such as nonlinearities and dynamic behaviour [
86]. The mismatch can be rectified by coupling the results of the data-driven module and the mechanistic module in a parallel manner via Kalman filtering, weighted or not weighted addition, multiplication, and so forth [
108]. It should be noted that the structures of the hybrid models are established based on the research objectives; thus, serial and parallel scenarios can be connected, and each module can contain multiple models or backpropagate information to achieve the desired goals [
109,
110].
Among the discussed hierarchically structured algorithms, AI-assisted multiscale scenarios combining data-driven modules with multiscale theoretical models are known to feature high informativeness. By integrating multiscale theoretical simulations, the intrinsically multiscale phenomena in CE can be considered and illustrated for a comprehensive understanding. Although multi-scale theoretical studies are prevalent in investigating CE processes due to their highly informative nature [
111⇓-
113], the development of AI-assisted multiscale hybrid models has just begun. In studies of hydrodynamics, the ANN-energy minimization multiscale (EMMS) drag scheme was established to provide efficient predictions of fluid dynamics in consideration of local environments by considering the ANN-predicted system heterogeneity in the calculation of drag coefficients by the EMMS model [
106,
114]. In this work, local gas-particle mixture properties were considered and efficiently connected to the macroscopic fluid dynamics through the ANN. It is notable that the above models connected the microscopic environments in a fluid clustering that still presented continuum characteristics. It is particularly difficult to compare microscopic insights that present discrete properties at the electronic or atomic scales with measurable continuum properties for validation. Accordingly, multiscale models that can connect discrete microscopic properties with continuum system features can fill in the gaps in system interpretations between the microscopic studies and the macroscopic investigations. However, there is hardly any investigation in the literature under the umbrella of this scenario, particularly regarding intercommunicating features within the algorithms.
To the best of our knowledge, one of the most significant investigations of hybrid modelling involving data-driven AI assistance and multiscale theoretical modelling, which achieved the aforementioned goal of practical intercommunications, was conducted by Chaffart and Ricardez-Sandoval [
115]. In their work, the researchers established a hybrid multiscale model to efficiently model a thin-film deposition process for optimization and control applications. Their model consisted of macroscopic continuum models used to capture the mass, energy, and momentum balances of the gaseous film precursor species within the system, coupled with a microscopic stochastic PDE model to simulate the evolution of the thin-film surface growth on a molecular level. The terms within the stochastic PDE were subject to coefficients whose values were known to vary depending on surface parameters calculated from the macroscale models-that is, the surface precursor concentration and temperature [
116]. However, the exact relationship between the PDE coefficients and the surface parameters was unknown. In order to mitigate this issue, the researchers trained shallow feedforward ANNs to predict the stochastic PDE coefficients for given surface parameters. Consequently, the full thin-film growth model developed into a hybrid multiscale model, in which AI learning models established essential communication channels between the macroscopic and microscopic system models. The researchers’ proposed hybrid modelling algorithm, therefore, provided one of the earliest examples of using AI to provide intercommunication and translation between microscopic and macroscopic models. However, as the multiscale algorithms that adopt theoretical model assemblies do not in themselves tend to contain multiscale properties, there can be no intercommunication within theoretical simulations, inhibiting their ability to contribute to a thorough understanding of the system.
Motivated by the lack of hybrid multiscale models to connect microscale models with macroscale observations of intercommunicating features, an ideal scenario that could maximize the benefits of AI in CE transformative utilization, known as intelligent intercommunicating multiscale engineering (IIMSE), was proposed, featuring informativeness based on AI-assisted multiscale models connecting microscopic insights with macroscopic behaviour [
117]. IIMSE presents highly possible hierarchical structures that combine intercommunicating multiscale theoretical models with data-driven AI algorithms in order to provide a connection between microscopic insights and measurable macroscopic observations. The intercommunicating feature of IIMSE highlights direct information exchange between the theoretical models in a serial order for a comparably complete comprehension of the studied processes. IIMSE can efficiently provide high-quality predictions of the investigated systems while providing a comprehensive understanding to fulfill the scientific goals, allowing it to possibly play a leading role in AI transformations in CE. Moreover, IIMSE provides a solution to connect microscopic insights with measurable macroscopic observations for the validation of the microscopic understanding and the mechanistic interpretation of the observed phenomena. However, this highly potential scenario is still in its infancy, and more attention is needed to facilitate its development. A possible serially constructed scenario is displayed in
Fig. 5 as an example of IIMSE for readers’ reference. In this scheme, a data-driven ANN connects a microscopic multiscale model to a macroscopic CFD analysis. Through this connection, the mesoscale models, which are difficult to mathematically represent, are substituted by the data-driven model to facilitate the implementation of this whole-scale simulation-that is, a simulation from the microscopic scale to the macroscopic scale-while maintaining reasonable efficiencies in order to predict the macroscopic measurable system performance in a large experimental domain.
The preceding mechanistic module encloses a three-scale intercommunicating theoretical model. This microscopic multiscale model is established by combining quantum chemistry analysis (i.e., DFT), molecular simulations (i.e., kinetic Monte Carlo, kMC), and continuum mass transportation equations inside a particle channel, as shown in
Fig. 6. The elementary reaction mechanisms, as well as the relevant energetic features, are calculated from DFT analysis and then transferred to the kMC model as surface events and reaction rates. The kMC analysis subsequently predicts the dynamic surface features, such as the diffusion rates and concentrations, to establish the boundary conditions of the continuum equations in the particle channel. The predictions of the continuum models, such as the coverages of the surface species, can be reversely delivered to the surface model of kMC to impact the surface evolution. This multiscale intercommunicating microscopic model has been partially demonstrated to be efficient in contributing to the comprehensive understanding of a CE process [
91,
118]. Posterior CFD analysis of the ANN can be initiated from the empirical parameters, such as the overall reaction rates connected to the microscopic states simulated by the microscopic intercommunicating multiscale models. Accordingly, the experimentally measurable system performance, such as the heterogeneous phase distribution and the substance transformation, is predicted by CFD analysis. The example IIMSE scheme has the potential to provide a solution to implement a whole-scale simulation of a CE process that can relate the microscopic states and mechanisms to the measurable system performance. In particular, the deep integration of theoretical models with data-driven algorithms dramatically improves the informativeness of the hybrid model, contributing to the scientific understanding of the investigated phenomena. Furthermore, the connection between the microscopic states and the measurable system features opens up the possibility of justifying the observable system behaviour using microscopic insights.
Informativeness is generally the most studied aspect of the transparency of AI utilization within CE applications. Physics-informed learning and hybrid modelling, both of which integrate first-principles knowledge into algorithm development, have made a major contribution as effective strategies to enhance the informativeness of AI transformative utilization in CE. Physics-informed learning, which leads the convergence toward physically consistent predictions by introducing biases rooted from first-principles comprehension, is a broader concept that encompasses hybrid modelling, which couples theoretical models with data-driven algorithms. Despite some overlaps, hybrid modelling usually generates independent predictions from theoretical models, while physics-informed learning induces theoretical constraints within the data-driven algorithms, based on current publications. Both of these methods contribute toward the informativeness of a system, and thus fulfill the scientific requirements of AI utilization in solving issues.
4. A case study of AI transparency exploration
In the previous sections, transparency investigations in AI applications were elaborated on regarding informativeness, causality, and explainability. These advanced analyses contribute to raising trust in critical domains, particularly in the focused field of this manuscript, CE. To further highlight the significance of AI transparency analysis and present an example to stimulate further related explorations in CE, this section presents a case study that provides a comprehensive analysis of transparency involving all three discussed features, as shown in
Fig. 7.
The case study considers a simplified version of the hybrid mul-tiscale thin-film deposition model discussed in Section 3 and outlined in
Fig. 7. It should be noted that the development of the hybrid multiscale model within the proposed case study was previously established within the literature [
115]. This system depicts a vapor deposition chamber with a substrate placed inside. The chamber is subsequently filled with gaseous precursor species that migrate to the substrate surface and deposit on it to form the thin film. The behaviour of the precursor gas species within the deposition chamber is captured using macroscale continuum equations that describe the precursor species mass and momentum balances according to the macroscopic gas-phase model equations [
115]. These equations are used to predict the molar fraction of precursor species at the thin-film surface
, which-in addition to the externally controlled substrate temperature
-serve as the key variables affecting the thin-film growth. The growth of the thin film is captured on a molecular level by the stochastic PDE shown under the title "Microscopic thin-film model" in
Fig. 8. In this expression,
denotes the local height of the film at location
along the substrate and at time
, and
denote coefficients whose values depend on
and
; and
is a random Gaussian variable with mean 0 and shot noise covariance
. It should be noted that each term in the stochastic PDE coincides with a specific molecular-level kinetic event that is expected to occur on the surface during thin-film growth: The term
corresponds to the adsorption of the precursor onto the film surface, whereas the terms
and
are respectively associated with the desorption and migration of adsorbed surface molecules. The results from the stochastic PDE expression are used to calculate the average film thickness
at time
, which can be subsequently used to calculate the term
(the difference between the rates of adsorption
and desorption
) from the macroscopic mass-transport surface boundary condition. Further details of the multiscale model can be found in Ref. [
115].
In the above multiscale model, the coefficients
, and
and the stochastic covariance
are all dependent on the surface parameters
and
. However, the exact relationship between the stochastic PDE coefficients and the surface parameters is unknown [
115]. In order to bridge the gap between the macro and micro scales, a series of shallow feedforward ANNs are trained to predict the values of
, and
as a function of the surface inputs
and
, as illustrated in the "ANN" segment of
Fig. 7. To accomplish this objective, the PDE coefficients are determined at discrete values of
and
by model-fitting the stochastic PDE to a kMC-based surface growth model, as highlighted in Ref. [
115]. The generated pairs of input
and output(v, K, F, V)coefficients are subsequently used to train individual ANNs for each of the output coefficients via backpropagation using the Levenberg-Marquardt algorithm. Each ANN contains a single hidden layer consisting of 10 neurons for the ANNs predicting
, and
, and 15 neurons for the ANN predicting
.
The informativeness of a model is a key criterion that can be used to provide a greater understanding of the underlying physical and chemical processes affecting the system behaviour. As highlighted in Section 3.3, model informativeness can be indicated from both a priori and a posteriori knowledge. In the proposed hybrid multiscale model, a great deal of the system behaviour is known in advance, due to the incorporation of known physical transport equations into the macroscale gas-phase model. These transport equations integrate key knowledge into the model regarding how the gaseous precursor species behave within the reactor chamber and on the state of the gas phase adjacent to the growing thin-film surface. In addition, due to previous studies, the stochastic PDE used to capture the molecular-level surface film growth is known to provide insights into the physiochemical kinetic processes taking place on the surface [
116], despite the lack of a microscale mechanistic model, as highlighted in the paragraphs above. These previous studies have specifically demonstrated that increasing the values of the PDE coefficients
, and
results in greater amounts of adsorption, desorption, and migration, respectively, on the film surface.
In addition to the system knowledge determined a priori, the fully assembled hybrid multiscale model can be used to gain further mechanistic insights into the inner workings of the processes taking place. In Ref. [
115], it was observed that lowering the surface temperature within the hybrid multiscale model promoted higher rates of surface adsorption and resulted in higher surface growth rates and higher surface roughnesses. On the other hand, higher surface temperatures were observed to promote higher rates of desorption and surface migration, and therefore resulted in lower surface roughnesses and smaller growth rates. These results indicate the effects of temperature on each of the kinetic processes taking place and specifically emphasize that the rates of desorption and migration are more significantly impacted than adsorption by the temperature. The hybrid model also demonstrates the effects of changing the concentration of the precursor species within the gas phase on the surface growth behaviour. More specifically, the model predicts that increasing the precursor species fraction results in larger growth rates and thus higher surface roughness. The opposite was observed when the precursor species fraction was decreased. These results indicate that surface adsorption is most strongly affected by the precursor species concentration; therefore, more adsorption is experienced when the concentration increases. Furthermore, the full hybrid model can be used to provide insights into how to achieve explicit surface specifications by controlling the thin-film growth process. For example, if the aim is to maximize the surface growth rate while producing a surface with minimal roughness, then the model predicts that it will be necessary to keep the temperature low during the early stages of growth, in order to prioritize adsorption, and then ramp up the temperature toward the end to maximize the surface migration events taking place and produce a smooth final film [
115].
In comparison with informativeness and explainability, causality indicates the correlations between the AI model inputs and outputs, which would provide efficient guidance for practical operations in particular. In
Fig. 9, the SHAP analysis (a prevalent tool for correlation analysis from inputs to outputs) of the ANN used in the explained hybrid model is presented to generate the impacts of the inputs, the precursor fraction, and the temperature on the output coefficient
. The result displayed in
Fig. 9 indicates that a high concentration and low temperature on the surface of the thin film lead to an increase in
, which generally suggests a drop in the migration rate, since
is inversely related to the migration rate [
116]. This comes from the fact that, within a low range of temperatures, a temperature increase leads to a higher adsorption rate. When the adsorption dominates, it creates a large number of highly unstable surface configurations, resulting in an increase in migration rates as the unstable surface sites diffuse across the surface to form a more stable surface configuration. However, within a higher range of temperatures, the rate of surface migration increases due to the Arrhenius relation between migration rates and temperature. Therefore, there is a competition within the calculation of the migration rate between the Arrhenius behaviour and the effects of the unstable surface sites available on the surface. With regard to the precursor fraction, when the precursor gas concentration is fairly high, the surface is supposed to have a high number of occupied sites. Since the adsorbed molecules are always attempting to move to form a more stable configuration, a more occupied surface caused by an increase in the precursor gas concentration would lead to higher migration rates. However, when the precursor fraction is lower, the adsorption rate for the surface is low; subsequently, there are very few molecules adsorbed onto the surface of the film. Due to the low concentration of the adsorbates, each migration event across the surface has a lower chance of forming a more stable surface configuration. Consequently, the loose surface adsorbates will need to undergo a significantly greater number of surface migration events in order to encounter other loose adsorbates and form a more stable surface configuration, since the stability of a surface molecule increases when it has a larger number of nearest neighbors [
116]. As a result, lower surface precursor species fractions result in a greater number of migration events due to the number of migrations that need to occur to form a stable surface configuration.
With regard to the explainability of the established ANN,
Fig. 10 compares two ANN structures for the same task. Both have one hidden layer, but one has 15 neurons (as used in the previous thin-film model) and the other has just one neuron. It is clear in the comparison that the original ANN used in the hybrid model with 15 neurons can bring about a highly nonlinearly activated output space, as shown, while the other ANN that only adopts one neuron in the hidden layer can only lead to a linearly activated output space. This result indicates that, even with only one hidden layer, the original ANN can still capture the nonlinear behaviour of the system while using 15 neurons. In the first and only hidden layer, all the neurons in both ANNs present a linearly activated space. However, the combination of 15 linearly activated spaces generates fairly high-quality predictions of the nonlinear system, whereas the shallow NN with one neuron cannot achieve a similar result. This comparison suggests that the width of the hidden layer of the original ANN ensures the capture of many linearly activated details of the system, which leads to the high prediction performance of the trained ANN for the studied system, while maintaining high efficiency.
Overall, the transparency analysis in this case study enhances the mechanistic understanding of the system, reveals the environmental factors impacting the studied dynamics, and illustrates the operational rationale of the ANN. In this way, effective guidance is delivered that contributes to the scientific comprehension, thin-film manufacturing, and further model establishment of the data-driven algorithms.
5. Perspectives on transparent AI
The transparency of AI applications within CE is proposed for the first time in this work; hence, no comprehensive application covering all aspects of transparency was available in the literature a priori. Furthermore, AI applications concerning any of the aspects of causality, informativeness, and explainability are currently limited within the context of , as discussed in the previous sections. As a result, there remains a significant need to advance the study and implementation of AI transparency techniques, particularly in the field of CE. This section will accordingly discuss the limitations and future research directions of AI transparency from the viewpoints of causality, informativeness, and explainability.
Causality, as a distinct defining aspect of AI transparency, has piqued the interest of chemical engineers for a relatively long period of time. This justifiability, in turn, makes it possible to optimize the observability of the operating processes in order to balance process controllability with economic benefits. Therefore, there is a great need to expand our current knowledge of AI process causality in order to improve the reliability of AI applications within CE. Model-agnostic post-hoc analysis is naturally more accessible compared with inner-rationale exploration and mechanism revelation. Accordingly, the most investigated applications are in post-operational SPC studies regarding feature relevance explainability[
9,
23]. In other words, it is essential for real-time explanations that dynamically modify onsite strategies to meet safety and efficiency requirements to be initiated in order to facilitate the revolution of intelligent manufacturing toward Industry 4.0 [
93]. Accordingly, the future of causality research within AI applications lies in pushing beyond the boundaries of our current techniques to provide dynamic and on-the-fly explanations for AI decision processes in any general CE application, based on the live extracted dependencies of the model outputs on the system inputs.
In addition to the causality of AI transformations, the need for scientific interpretations in CE has led to a flourishing of informativeness development within AI applications. However, despite their intuitive promise of greater AI transparency, many of the proposed hybrid and physics-informed modelling methods have only been partially explored at present and require extensive further study. With the assistance of powerful simulation algorithms and accessible computing recourses, highly hierarchically constructed hybrid models are encouraged in order to achieve high prediction performance and thorough comprehension. In particular, IIMSE, through its combination of intercommunicating multiscale theoretical models and data-driven AI paradigms, holds great promise for playing a leading role, with its high adaptability to CE investigations and strong potential to maximize the benefits of
to solve bottleneck issues existing in CE. The features of IIMSE that connect microscopic insights to measurable macroscopic behaviour open up possibilities to efficiently predict CE operations from microscopic material design to optimized system operations for specific objectives, such as pharmaceutical development and oxygen carrier exploration. Thus, further efforts should be focused on developing whole-scale IIMSE models to capture different CE applications from the electronic to industrial scale. Moreover, hybrid models that exploit theoretical models to provide datasets with greater information content are required to compensate for the performance of systems under harsh condition, as such datasets are otherwise difficult to obtain for safety reasons. Alongside IIMSE, the hybridization of symbolic AI techniques and data-driven AI algorithms, such as the explainable artificial intelligence framework for mechanistic explanation generation (XAI-MEG) [
8], presents another solution to the deficiency of AI utilization in terms of both mechanistic insights and causal explanations, and thus deserves further attention. Likewise, the integration of ANN and fuzzy systems has provided multiple modelling applications that process interpretation capacity, such as fuzzy neural networks (FNNs) and neuro-fuzzy networks. In fact, FNNs and neuro-fuzzy networks have gained attention from researchers within diverse fields since the 1990s. Their inherent strong adaptability, inherited from fuzzy systems, has empowered them with extensive application capacity. Accordingly, more significant contributions from these hybrid models can be expected [
119].
Regarding the inner rationales underlying developed AI systems, the scarcity of related studies in the field of AI explainability indicates the particular difficulty of such investigations. Thus, even among the existing analyses on the explainability of DNNs implemented in CE, it cannot be ignored that exploration of the internal mechanisms of AI within CE is extremely limited, compared with the numerous AI-related studies. Furthermore, much of the current focus on AI explainability has only been applied to DNNs thus far. As a result, there is a need for a tremendous endeavour to elucidate the operational mechanisms of critical techniques along with numerous other widely used methods, such as recurrent connections, convolutional filters, reinforcement learning, and more [
120⇓-
122]. Such research pathways are crucial within the future research of CE AI implementations, in order to advance the understanding of the delineation of decision regions within the operational processes of multilayered AI algorithms and contribute to the ultimate explainability of the predominant deep learning applications. In keeping with this perspective, the prosperity of AI transformative utilizations should be vigorously promoted by illustrating the effects of AI networks’ hidden units in order to improve the performance of deep learning paradigms and guide their extensive application in other CE processes. Step-by-step comparisons from simple structures to multilayered systems have already been demonstrated to be efficient within these investigations [
51,
52]. However, similar work is required to study deep learning assemblies that parallelly exploit inherently transparent proxy models such as SVM or decision trees as an additional route within inner-rationale exploration.
Transparency research within AI applications in CE is a growing field that is in high demand, and whose need is greater than ever. The pressing demand for related research necessitates tremendous efforts to promote the practical use of AI in CE. Consequently, there is still much work that must be conducted in order to fully incorporate transparency implementations within AI applications in CE.
6. Concluding remarks
The tremendous global revenues created by AI transformations indicate the inevitable change brought about by this revolutionary technology. The predominating AI paradigms enable the ameliorated predictions of sophisticated CE processes via hierarchically extracting data representations. However, as a tradeoff for the high prediction performance of AI, these AI transformations experience greater model opacity. Such non-transparent algorithms can subsequently cause ethical and regulatory issues by providing non-justifiable decisions. Particularly in critical domains such as CE, which involve health, safety, and huge economic value, greater transparency of AI applications will directly lead to an increase in the reliability of these decision-makers for humans. Accordingly, transparency is the vital missing link needed to boost AI transformative applications in CE.
Motivated by the scenario described above, this review defined for the first time the vital concept of transparency within AI utilizations in CE, rooted in XAI, with a special focus on the characteristics and requirements of the CE field. This concept of transparency was elaborated upon in terms of the aspects of causality, explainability, and informativeness-that is, the correlation between predictions and the considered input features; operational rationales hindered by decision-making processes; and the mechanistic insights of the investigated systems. These methods, along with their state-of-the-art applications regarding all three essential transparency factors, were discussed and compared in order to highlight the significance of reliable and responsible AI utilization, thereby stimulating related investigations.
Informativeness in particular has recently become a current focus in the transparency exploration of AI applications in CE. This focus stems from the inherent features of the CE field, the comparatively expensive information patterns from datasets, and the necessity to reveal the underlying mechanisms for system comprehension. The hybridization of prior knowledge, including first-principles understanding and theoretical models, and data-driven AI methods provide a solution to enhance the informativeness of these applications. More specifically, extensive investigations in physics-informed and AI-assisted hybrid models highlight their effectiveness in contributing to providing mechanistic insights into systems with sparse datasets.
Causality, on the other hand, has piqued notable attention from chemical engineers, who aim to optimize the observability of system behaviour while balancing the system controllability and economic benefits. The impact factors of various systems have accordingly been revealed in order to facilitate process intensification, operational control, and even onsite monitoring. The major contributions of causality explorations are primarily in SPC and should be extended to other areas, such as material development, through the implementation of powerful post-hoc model-agnostic analyses.
One of the most challenging aspects to investigate is the ability to explain the inner rationales of AI systems. Nonetheless, AI explainability remains an effective way to enhance the reliability of AI applications in CE. The examination of the inner mechanisms of AI applications is crucial for accelerating AI transformations in CE and gaining valuable guidance for the development of data-driven algorithms.
This review aims to improve chemical engineers’ awareness of the significance of developing reliable AI transformative utilizations within CE. Consequentially, it is anticipated that chemical engineers will reach a consensus on reliable AI utilization as an innovative technique with the potential to address bottleneck challenges in our field.
Compliance with ethics guidelines
Yue Yuan, Donovan Chaffart, Tao Wu, and Jesse Zhu declare that they have no conflict of interest or financial conflicts to disclose.