《1.Introduction》

1.Introduction

Process operating performance assessment is an important subject in the process industry and has attracted attention in both academia and industry. Since the performance of processes may deteriorate over time and depart from the initial design due to process variations or process condition changes, it is necessary to continuously monitor process performance. This type of analysis is a step forward from traditional control performance assessment and has been named “optimality assessment.”

Some studies on optimality assessment [1‒ 4] have recently been conducted. However, these studies do not describe a method that is applicable to general, complicated process operations. In this paper, a systematic framework for optimality assessment is proposed that addresses the main issues associated with the performance assessment of modern industrial processes. First, we consider multiple operating modes due to the process condition and product demand changes. Second, we introduce multiple operating regions in each steady-state mode due to uncertainties and disturbances. Third, we consider transitions between different operating modes.

To solve these problems, a novel method for optimality assessment based on probabilistic principal component regression (PPCR) is proposed. It is first described for the unimodal processes that are common in practice, and is then extended to multiple  operating mode processes.  For  unimodal  processes,  the  developed  method consists of two stages: offline training and online assessment. In offline training, the steady-state data, including process variables and the optimality index (OI), are collected. Note that the OI definition depends on the process. For example, depending on the process, OI can refer to operation costs, profit, product quality, environmental index, and so on. To obtain an online estimation of OI, it is necessary to build a predictive model of OI based on the process variables. Since each operating mode usually has multiple operating regions, the mixture probabilistic principal component regression (MPPCR) model is employed for modeling. The MPPCR model describes the Gaussian distribution of OI in each operating region, based on which the local value of OI in each operating region can be obtained. By comparing the local OI in each operating region, their optimality condition is analyzed. In online assessment, the operating region of a new data point is estimated based on its posterior probability. Based on the constructed model, OI is predicted using Bayesian inference to evaluate the process performance. When the process performance is non-optimum, diagnosing the cause of the problem helps steer the process to a better performance. A probabilistic contribution analysis technique based on the missing variable approach [5] is adopted to address this issue. The sequential floating forward search (SFFS) method is utilized, instead of a branch and bound method, in order to decrease computational time and simplify the solution.

For multiple operating mode processes, it is assumed that the data points are unlabeled with respect to the operating modes. In other words, the number of operating modes and the operating mode of each data point are unknown. To estimate the labels of the dataset, critical process variables that govern the change of operating modes are selected and named as scheduling variables. Based on the selected scheduling variables, a local kernel-density-based approach [6] is adopted and improved to detect the labels of the data points. In order to estimate the operating modes in online assessment, a mixture discriminant analysis (MDA) is built based on the labeled dataset. In addition, to improve the accuracy of online mode detection, the process knowledge is incorporated into the MDA results. Optimality assessment of steady-state modes is the same as for unimodal processes. For transitions between modes, a dynamic principal component regression (DPCR) model was built, and the performance grades are compared based on the DPCR loading matrices [7].

The rest of this paper is arranged as follows: In Section 2, the problem and the proposed solution are discussed. In Section 3, the proposed  optimality  assessment  strategy  for  steady-state  modes is described. In Section 4, the assessment method for transitions is studied. In  Section 5,  the mode  detection method  for multiple operating mode processes is described. In Section 6, the proposed approach is tested on a Tennessee Eastman (TE) process. Finally, conclusions are presented.

《2.Problem statement and proposed solution》

2.Problem statement and proposed solution

General process operations have multi-modal characteristics with non-Gaussian behavior in each steady-state mode. An overview of these systems is given in Fig. 1. It is considered that the change of operating modes is caused by known governing factors such as product  demand.  In  addition,  each  steady-state  operating  mode consists of different operating regions that are caused by uncertain process variations. The optimality level is altered based on the operating position in the system.

《Fig. 1》

Fig. 1. An overview of general process operations.

In this paper, the goal is to assess online operating process performance based on routine operating process data by characterizing the data—that is, by estimating operating mode and operating region (or transition grade), predicting the OI value, and diagnosing the cause of poor performance. The proposed framework includes offline training and online assessment. An overview of the proposed framework and methods is given in Fig. 2 and Fig. 3, and the details are described in the following sections.

《Fig. 2》

Fig. 2. An overview of the proposed framework and methods for offline training.

《Fig. 3》

Fig. 3. An overview of the proposed framework and methods for online assessment.

《3.Steady-state modes: Definition》

3.Steady-state modes: Definition

Steady-state modes are the main operating conditions of processes during which no essential change occurs in the critical process variables, flowsheet configuration, product demand, and so on. The MPPCR model is utilized to estimate the model of the training dataset. In the next step, based on the detected model, the local OI values of each operating region are obtained. Furthermore, based on process knowledge, some classes for optimality values are defined, and the obtained operating regions are assigned with various corresponding classes.

《3.1. Data modeling》

3.1. Data modeling

Suppose X = [x(1), x(2), …, x(n)] T ∈ R n×p and Y = [y(1), y(2), …, y(n)] T ∈ R n×1 are the available datasets of the process variables and OI, respectively, where n is the number of data points and p is the number of process variables. Since operating modes consist of several operating regions, the MPPCR model is employed to build a predictive model for which the input is X and the output is Y.

《3.2. Analysis of optimality index》

3.2. Analysis of optimality index

The Gaussian distribution for the OI in each operating region k is estimated in the modeling part. As a result, the local OI [1] in each operating region is equal to the mean value of the obtained Gaussian distribution for y:

《3.3. Non-optimum cause detection》

3.3. Non-optimum cause detection

In order to find causal variables in the presence of non-optimum or poor performance, one can utilize a probabilistic contribution analysis technique based on the missing variable approach. This method has been applied for fault detection [5,8], outlier detection [9], and so forth. In this paper, we adopt this method with a modification for causality detection in optimality assessment. In the modified method, the best region is called the reference or benchmark region for probabilistic causality analysis. A new data point with non-optimum performance is detected when its Mahalanobis distance from the reference region, M2, is larger than the confidence bound that is the β-fractile of the chi-square distribution with r degrees of freedom ( χr2 ( β)). In this method, the contribution of each single variable equals the difference between M2 and the expected value of M2, E(M2), when the considered variable is treated as a missing variable. We propose the following algorithm to find a group of causal variables based on the SFFS method [10]: The complete set, xnew = Y = { yj | j = 1, 2, …, p}, includes all measured variables of xnew. The aim is to find the minimum number of missing variables that have the recalculated value of E(M2) less than the confidence bound. Supposing that we have selected a subset of k that is missing variables Xk, the criterion function in this problem is as follows:

where is the expected value of M2 conditioning on missing the selected subset, that is, Xk. The algorithm starts with k = 1 and is as follows:

Step 1: Find k features from Y that is called Xk by using the SFFS algorithm that maximizes J(Xk).

Step 2: If is less than the confidence bound, Xk is the final set of causes; otherwise, k = k + 1 and go to Step 1.

《4.Transitions》

4.Transitions

Transitions mainly happen in processes with multiple operating modes between two steady-state modes. In this paper, it is assumed that the change of operating modes is supervised. As a result, critical process variables governing the change of operating modes are measured, albeit with noise in each process, and are often called scheduling variables.

《4.1. Transition grades analysis》

4.1. Transition grades analysis

Dynamic principal component analysis (DPCA) is employed in order to consider the autocorrelation in the variables as well as their time-varying features by incorporating time-lagged information in the data matrix. The loading matrix of each transition based on its scheduling variables is found, and the similarity indices are computed. Srinivasan et al. [7] have provided the details of this method. Suppose that S and T are two transitions with the same initial and final steady-state modes. If their similarity index is larger than a user-defined threshold θT, they belong to the same transition grade.

《4.2. Transition predictive modeling》

4.2. Transition predictive modeling

DPCR is built based on the complete training dataset. For each transition, when DPCA is built, the regression step is applied on the estimated latent variables. The estimation of OI in the online assessment, based on the estimated grade, equals the average of the estimated values of each transition model in the grade.

《5.Mode detection》

5.Mode detection

In order to extend the proposed algorithm to multi-mode systems, a mode detection step should be considered to detect the steadystate modes and transitions, and the methods discussed in Sections 3 and 4 should then be employed for performance assessment. Mode detection consists of labeling the operating modes and building the predictive classifier based on the estimated labels.

《5.1. Operating modes labeling》

5.1. Operating modes labeling

Quiñones-Grueiro et al. [6] recently proposed an offline mode detection method based on a local kernel density estimation for a monitoring application. This method is based on the density-based clustering (DENCLUE) method, and is adopted here to integrate the process sequence information in order to improve the accuracy. In this paper, the abovementioned offline mode detection is employed and modified for optimality assessment. This algorithm provides an efficient procedure for operating modes labeling, without the requirement of knowing the number of modes as a priori knowledge. The proposed extensions to the abovementioned method are as follows:

(1)In order to find the exact start and end time of the transitions, each transition part is segmented into shorter length windows to investigate the dynamics more clearly.

(2)The initial and final windows of steady-state modes are compared with each other based on the distance criterion in order to detect final steady-state modes and transitions.

(3)Initial and final steady-state modes of each transition are compared based on the distance criterion. If they are similar to each other and represent the same operating mode, the detected transition is considered to be a noise effect rather than a real transition.

《5.2. Online mode detection》

5.2. Online mode detection

In online assessment, the corresponding operating mode of each new data point is estimated in order to select the proper model. As a result, a predictive classifier is built based on the estimated labels in the offline mode detection step. Fraley and Raftery [11] have integrated the classification method of MDA with model-based clustering (Mclust) in a method that is called Mclust discriminant analysis (MclustDA), which is capable of  classifying  non-Gaussian  classes. The classification model of each operating mode, including steadystate modes and transition grades, is built based on the MclustDA method. Note that in this paper, it is assumed that all the operating modes are known in offline training. However, new modes that have not been studied before may appear in online mode detection. One possible solution to detect new operating modes is to compute the joint probability of the conditional probability of the new data point in each operating mode and the posterior probability of each operating mode. When the joint probability has an insignificant value, it states that a new operating mode has appeared [12].

For online prediction of the operating modes, process knowledge is incorporated in order to increase the accuracy of the prediction [4]. In other words, instead of computing posterior distributions of all operating modes for each data point, the posterior of related operating modes is computed as follows:

(1)If the current operating mode of the data point belongs to steady-state mode i, for the next point, the posterior probability of mode i and all the transitions from mode i are predicted.

(2)If the current data point is in the grade p of transition ij, that is, {ij}p, the posterior probability of {ij}p  and steady-state mode j are computed. Finally, the data point is classified to the operating mode with the highest posterior probability. Note that considering a single data point may lead to an incorrect solution in noisy environments. In that case, it is suggested to evaluate a window of the data points that provides a more robust estimation of the operating mode change.

《6.Tennessee Eastman benchmark process》

6.Tennessee Eastman benchmark process

The TE benchmark process has been broadly used for the evaluation of many methods in process control, soft sensor design, monitoring, and so forth. The model was first developed by Downs and Vogel [13], based on the industrial process of the Eastman Chemical Company. In order to have a stable process, the decentralized control strategy is applied on the open-loop process that was developed by Ricker [14].

Three different operating modes are simulated based on the set points summarized in Table 1. In addition, two uncertainties are added in each operating mode, as expressed in Table 2. OI is selected to be the operation cost. The offline training data projected into two variables of the A and C feed (stream 4) and recycle flow (stream 8) are shown in Fig. 4. To clarify, the approximate boundary of each operating mode is shown in the figure.

《Table 1》

Table 1 Properties of stable operating modes.

 “G” and “H” are the main products of the TE process.

《Table 2》

Table 2 Process uncertainties.

《Fig. 4》

Fig. 4. Two-dimensional plot of the offline training data. kscmh: kilo standard cubic meters per hour.

The defined levels for the OI values are stated in Table 3. Note that optimality levels are defined as being worse at a higher level. The local OI values and levels are given in Tables 4–6.

《Table 3》

Table 3 Defined OI levels.

《Table 4》

Table 4 Local OI levels (mode 1).

《Table 5》

Table 5 Local OI levels (mode 2).

《Table 6》

Table 6 Local OI levels (mode 3).

《Online assessment》

Online assessment

The computed  classification  error  for  online  mode  detection is 0.0104, which indicates a high accuracy of mode detection. OI values are predicted, and the comparison plot of predicted and real values for the OI is given in Fig. 5. Since the employed models vary along the process, the corresponding models are stated in the figure. In addition, root mean square error (RMSE) and R2 values are computed as 0.3723 and 0.8475, respectively, indicating a high accuracy in predicting the OI values. The offline mode detection, online mode detection, and prediction results are summarized in Table 7.

The estimated OI levels are given in Fig. 6. According to Fig. 6, the process starts with optimal operation and then jumps to level 2 optimality. The 1219th sampling data point is selected as an example to find the cause of non-optimality. Based on the previous estimations, this data point belongs to operating region 1 of operating mode 1. Based on Table 4, operating region 3 has the lowest OI level in mode 1; therefore, it is selected as the reference mode or benchmark for non-optimum cause detection. The distance of this data point from the reference mode is 195.11, that is, greater than the 0.95-fractile of the chi-square distribution with 22 (number of process variables) degrees of freedom (  ( 0.95) = 33.924). Nine causal variables that can be validated based on process knowledge are detected, and their contribution percentage is given in Fig. 7. When these nine variables are assumed to be missing, the distance from the reference mode becomes 33.66, that is, less than ( 0.95), which indicates that the process is steered to the optimum performance.

《Fig. 5》

Fig. 5. Comparison of predicted and real values of OI.

《Table 7》

Table 7 Summary of the results.

ARI: adjusted Rand index; FM index: Fowlkes-Mallows index.

《Fig. 6》

Fig. 6. Estimated OI levels.

《Fig. 7》

Fig. 7. Contribution percentage of the causal variables at sample 1219.

《7. Conclusions》

7. Conclusions

In this paper, a novel framework for operating optimality assessment in non-Gaussian multi-mode processes is established. The proposed method is  capable  of detecting  operating modes,  transitions,  and regions, and providing a model for a prediction of process operation performance. In addition, a causality detection method is introduced for diagnosing poor or non-optimum behavior. An application on the TE benchmark process is presented, and confirms the applicability of the proposed method.

《Acknowledgements》

Acknowledgements

This work is supported in part by the Natural Science Engineering Research Council of Canada and by Alberta Innovates Technology Futures.

《Compliance with ethics guidelines》

Compliance with ethics guidelines

Shabnam Sedghi and Biao Huang declare that they have no conflict of interest or financial conflicts to disclose.