《1. Introduction》

1. Introduction

The global industry has been experiencing accelerating changes during the recent transformation of traditional manufacturing into smart manufacturing [1,2]. During the conversion process, industries face a number of challenges posed by smart manufacturing, which have attracted great attention in both academic and practitioner communities [3], particularly in the process industry [4]. Some of the challenges to be covered in this work include:

• The use and analysis of data, with a particular focus on the development of data-driven surrogate/metamodels to simplify complex processes and to enable manufacturing intelligence;

• The implementation of multiscale modeling and optimization to integrate strategic and planning decisions with operations in order to support enterprise-wide coordination and optimization;

• The development of computationally efficient models, algorithms, and tools in order to find global optimal solutions for smart manufacturing decision-making and to enable large-scale optimization.

In this work, we aim to develop optimization-based decisionmaking models for optimal purification strategies in the manufacturing process of an antibody product based on simple data-driven models, in an attempt to cope with the above challenges in the biopharmaceutical industry. In order to achieve better control of the processes and improve production efficiency, biopharmaceutical manufacturing process optimization problems have been investigated using different modeling and solution techniques, such as metaheuristic [5], dynamic optimization[6], evolutionary algorithm [7–9], Markov decision method [10], and mixed-integer programming [11–22]. Data-driven models—also known as surrogate models or metamodels—refer to models that are built on the basis of data, but are not dependent on theoretical knowledge of the concerned processes or systems. Data-driven models of complex processes and systems provide model simplicity and computational efficiency [23], and their integration with optimization requires less computational effort and has a broad application in the engineering field [24,25]. In particular, such models have demonstrated research benefits in the modeling and optimization of chromatography purification operations [26–29]. However, only a few attempts have been made to integrate data-driven models into optimization models for biopharmaceutical purification processes. Nagrath et al. [30] developed an artificial neural network (ANN)-based hybrid model for the optimization of preparative chromatographic processes. Pirrung et al. [31] developed ANNs from detailed mechanistic models and integrated them into the optimization of biopharmaceutical downstream processes for the maximum yield of a process with three different chromatographic columns.

Antigen-binding fragment (Fab) products are regarded as the next generation of protein-based biotherapeutics after monoclonal antibody (mAb) products, and offer many advantages due to their simpler and smaller structure [32]. There is a need to develop a cost-efficient process for Fab production in industrial practice [33], but few works on this topic exist in the literature. In this work, the multiscale optimization of the purification process of a Fab product is addressed, using microscale column chromatography experiment data for manufacturing-scale chromatography optimization. In order to achieve the most cost-efficient process, besides considering chromatography column-sizing strategies at the design level, operational decisions are taken into account— particularly the flow velocity at chromatography steps. Datadriven models are developed to estimate the chromatography throughput. When incorporating the developed data-driven models, a number of mixed-integer programming models are proposed to find the optimal Fab purification strategies, and are examined through case studies. To the best of our knowledge, this is the first work in the literature on the multiscale optimization of the purification process of Fab using data-driven models.

The rest of this paper is organized as follows: The optimization problem is described in Section 2. The data-driven models of the chromatography operations are developed in Section 3, while the proposed mathematical programming optimization models are described in Section 4. Section 5 presents industrially relevant case studies, followed by the computational results and discussion in Section 6. Finally, concluding remarks are provided in Section 7.

《2. Problem statement》

2. Problem statement

This work investigates the optimization of the manufacturing processes of a Fab product. Fig. 1 shows the Fab manufacturing process flowsheet studied in this work. Initially, the mammalian cells expressing the Fab are cultured in bioreactors in the upstream processing (USP) before entering into the downstream processing (DSP). In the DSP, the Fab protein product is purified by a number of operations, including centrifugation, homogenization, filtration, ultrafiltration/diafiltration (UF/DF), and three packed-bed chromatography steps, which include affinity, cation-exchange, and anion-exchange chromatography steps.

《Fig. 1》

**Fig. 1. A Fab manufacturing process. Orange boxes represent chromatography operations. USP: upstream processing; DSP: downstream processing; UF/DF: ultrafiltration/diafiltration.**

The chromatography column-sizing strategies are important to the efficiency of the whole purification process. These strategies include the number of parallel columns at each step, diameter and bed height of the columns, and number of cycles per batch, which significantly affect the cost, time, and output of the whole manufacturing process. In real practice, there are standard columns with different diameter sizes, and the bed height are set to a range of typical integer values. Thus, in this work, the column diameter and bed height are optimized from a given set of discrete alternative values.

Chromatography operations are complex process with challenges in modeling their behaviors. To optimize the chromatography operational strategies, metamodeling techniques are used in this work to mimic and predict the chromatography process performance, particularly the chromatography throughput, indicating the rate of the product output of one column within a given period, which is an important metric of the chromatography operation. To develop manufacturing-scale data-driven models, data from a microscale column chromatography laboratory experiment are collected and then fitted to obtain isotherm parameters. First-principle chromatography models are also created with scale-related parameters to capture challenges on scaling [34]. First-principle models with isotherm parameters are solved, followed by simulation runs to generate manufacturing-scale datasets using COMSOL Multiphysics simulation software [35]. The datasets include the throughput output under different input conditions of loaded mass, flow velocity, and column bed height at the two chromatography steps in the binding and elution mode—namely, the affinity and cation-exchange chromatography steps. The datasets are then used to derive data-driven models, which are incorporated into the proposed optimization models in order to reach optimal manufacturing-scale chromatography decisions. The whole procedure of the multiscale optimization approach is presented in Fig. 2; the steps presented in the last three tan-colored boxes in the figure will be described in detail in the following sections. Note that it is assumed that the considered input conditions do not affect other chromatography parameters, such as the resin’s yield, binding capacity, and lifetime, which are known parameters in this problem.

《Fig. 2》

**Fig. 2. Procedure for the multiscale optimization of chromatographic strategies.**

In summary, the optimization problem considered in this work is described as follows:

**Given:**

• The process flowsheet of a Fab product;

•The number of bioreactors and their volumes, along with the bioreactor titer;

• The chromatography operation parameters, including the yield, buffer and eluent usage, dynamic binding capacity, lifetime, etc.;

• The non-chromatography operation parameters, including the yield, processing rate, buffer usage, etc.;

• Time-related data, including the processing rate, bioreaction time, annual operating time, etc.;

• Cost-related data, including the labor wage and the costs of the resin, buffer, media, equipment, etc.;

• Chromatography data from simulations based on firstprinciple models and microscale column experiment data;

• The candidate column diameter and bed height, and the maximum number of cycles and columns.

**Determine:**

• Chromatography column-sizing strategies, such as the column diameter, bed height, and number of columns at each chromatography step;

• Operational strategies, such as the liner velocities, loaded product mass, and number of cycles at the affinity and cationexchange chromatography steps;

• The number of total completed batches;

• The annual total processing time;

• The annual total production output;

• The annual total cost.

**So as to** minimize the cost of goods (COG) per gram of the Fab product—that is, the ratio of the annual total cost to the annual total production output.

《3. Data-driven models》

3. Data-driven models

In this section, simple data-driven models are developed for chromatography performance based on simulation datasets. In this work, as a key performance criterion, the chromatography throughput is considered as the output of the models. Only the affinity and cation-exchange chromatography steps are modeled, due to the limited available data. Both the datasets for the affinity and cation-exchange chromatography steps are based on a single chromatography column with a diameter of 1 m. Three key variables influencing the chromatography throughput are considered in the datasets as the inputs of the data-driven models, namely: the loaded mass, flow velocity, and column bed height.

A number of widely used methods are implemented in order to achieve accurate and simple models, including linear regression, support vector regression (SVR) [36], kriging [37], pace regression [38], response surface methodology (RSM) [39], and piecewise linear regression [40]. To estimate the prediction accuracy of these methods, cross-validation is performed. Given a dataset, *n*-fold cross-validation randomly splits the samples into *n* subsets of equal size. Then (*n* – 1) subsets of samples are used in the training, and the remaining set is used to validate the prediction accuracy of the obtained data-driven models. In this work, 10 rounds of fivefold cross-validation are performed by generating random sample splits, and the mean absolute error (MAE) over all 50 testing sets is used as the final error metric for comparison of the prediction accuracy. Linear regression, SVR, kriging, and pace regression are implemented in WEKA machine learning software [41] with default settings, while RSM and piecewise linear regression are run in GAMS [42] using the CPLEX mixed-integer linear programming (MILP) solver. Table 1 presents the prediction error results obtained after running the cross-validation on the datasets for affinity chromatography with 3081 samples, and for cationexchange chromatography with 2847 samples.

Table 1 shows that the piecewise linear regression method gives the best prediction accuracy among all the tested methods. The piecewise linear regression method creates a model to separate samples into multiple complementary intervals on one input variable, with the flexibility of each interval being fitted by its own linear regression function. Considering its ease of modeling and understanding, the piecewise linear regression method is chosen to create the final data-driven model for chromatography throughput estimation, where all samples are used in the training process. In the procedure of the piecewise linear regression [40], each input variable in turn serves once as the partition variable. For each partition variable, an MILP model is solved to determine the breakpoint of the partition variable between only two intervals, and the variable corresponding to the minimum training error is kept as the partition variable. Until the termination criterion is met, the number of intervals is increased and MILP models are solved iteratively with the same partition variable. Following this procedure, the following two models are obtained to estimate throughput, , at chromatography step s (af for affinity and ce for cation-exchange):

《Table 1》

**Table 1 MAE comparison of different methods.**

where the superscript 1 of the variables refers to the column with a diameter of 1 m. In the obtained models, the loaded mass, , at chromatography step *s* is determined to separate three intervals by the procedure given in Ref. [40], which provides a smaller prediction error than the other two variables, and , which are the linear velocity and the column bed height, respectively.

In order to incorporate the above obtained data-driven models into the optimization models for decision-making, they need to be reformulated by introducing a binary variable. The binary variable, , is defined to be equal to 1 if the loaded mass at chromatography step s lies within interval *r*; the throughput output is then obtained from the corresponding liner function. As there is only one interval to be selected, the binary variable should satisfy Eq. (3):

The value of the separate input, , in this model should be between the two breakpoints () of the selected interval, which can be formulated as the following linear equation:

where is a small number to separate two successive intervals at the breakpoints. The general expression of the throughput output is as follows:

where are parameters in the function. When interval is selected at step *s*, that is, = 1, Eq. (4) becomes In this case, the loaded mass lies in the interval ; Eq. (5) then ensures that the throughput is equal to the output of the linear function on the selected interval :

The above throughput regression model will be incorporated into the optimization models of Fab manufacturing processes in the next section.

《4. Optimization models》

4. Optimization models

In this section, we will use the above data-driven models to develop two mixed-integer nonlinear programming (MINLP) models of the purification processes of a Fab product, using different modeling methods of alternative column sizes. These MINLP models are then reformulated as MILP models using exact linearization techniques and multiparametric disaggregation.

《4.1. MINLP model A》

4.1. MINLP model A

MINLP model A is formulated based on the model provided in Ref. [16] for a mAb manufacturing process. In this model, a number of alternative column volume sizes are first generated from combinations of the given discrete column diameter and bed height. The column volume size *i* at chromatography step s () corresponds to a specific diameter size () and bed height () among the given alternatives.

4.1.1. Column volume

The total column volume (*TCV _{s}*) at chromatography step

*s*

*CS*(

*CS*is the set of chromatography steps,

*CS*={af, ce, ae}, ae stands for anion-exchange) is determined by the number of columns (

*CN*) in the selected size multiplied by the corresponding column volume:

_{s,i}By introducing a binary variable, *X _{s,i}*, for the selection of column size

*i*at chromatography step

*s*, the following constraints can ensure that only one column size can be selected:

where max*CN _{s}* refers to the maximum allowed number of columns.

In each batch, the available resin volume at a chromatography step s—that is, the total column volume (*TCV _{s}*) multiplied by the number of cycles per batch (

*CYN*)—must be sufficient to process all protein mass entering into that step (

_{s}*M*

_{s}_{–1}), and the required resin volume (

*RV*) is determined by the dynamic binding capacity (

_{s}*dbc*) and resin utilization factor (

_{s}*μ*).

4.1.2. Product mass

The initial product mass (*M*_{0}) entering into the DSP process is equal to the bioreaction titer (*titer*) multiplied by the bioreactor working volume—that is, the bioreactor volume (*brv*) times the working volume ratio (*α*).

The product protein mass going out from step *s* is equal to the mass from the previous one, step (*s* – 1), multiplied by the corresponding yield of step *s*, *yd _{s}*.

The annual product output (*AP*) is the product mass after the bulk fill step (*s* = bf):

where *BN* is the number of completed batches, upper bounded by the maximum allowed batch number, and is the batch success rate.

4.1.3. Product volume

The initial product volume entering into the DSP (*PV*_{0}) is equal to the working volume of the bioreactor, formulated as follows:

For the first four steps of the process, including the first centrifugation (*s* = ct_{1}), homogenization (*s* = ho), second centrifugation (*s* = ct_{2}), and filtration (*s* = f i) steps, the product volume remaining after step s (*PV _{s}*) is equivalent to the product volume entering into this step:

At the affinity (*s* = af) and cation-exchange chromatography (*s* = ce) steps, the product volume is equal to the eluent volume, while at the anion-exchange chromatography (*s* = ae) step, the product volume does not change.

where* ecv _{s} *is the eluent volume to column volume ratio.

At the first UF/DF step (*s* = uf_{1}), the flush volume is added to the product volume entering the step:

where *fvr* is the flush volume ratio at this step. At the second UF/DF step (*s* = uf_{2}), the remaining product volume is the mass divided by the filling concentration, *fconc*:

4.1.4. Buffer volume

The buffer volume added in each step (*BV _{s}*) is defined as follows:

where *bvr _{s}* is the buffer volume ratio at centrifugation step

*s*and

*bcv*is the buffer volume to column volume ratio at chromatography step

_{s}*s*, and

*f vr*and

*dvr*are the flush volume ratio and diafiltration volume ratio at the first and second UF/DF steps, respectively.

The total required buffer volume in each batch (*BBV*) is the sum of the buffer volume at all steps, and the annual buffer volume (*ABV*) is the total buffer volume of all the completed batches.

4.1.5. Processing time

The processing times (*T _{s}*) at the affinity and cation-exchange chromatography steps are determined by the mass output of each column divided by its throughput (

*TP*):

_{s}At the anion-exchange chromatography step, the processing times for the loading product (*PLT*) and adding buffer (*BAT*) are calculated separately using the volumetric flow rate (*VFR*) to obtain the processing time at the step.

where *vel* is the linear velocity of flow at the anion-exchange chromatography step. The processing time at the bulk fill step is assumed to be constant, while at the other non-chromatography steps, the process time is equal to the corresponding product volume divided by the processing rate (*pr _{s}*):

The processing time of one batch, *BT*, is the total processing time for all steps, divided by the shift duration (*sfd*) and the number of shifts per day (*sfn*):

The annual processing time (*AT*) is the total processing time of all batches:

which is limited by the annual operating time (*aot*) minus the seed train bioreaction time (*st*) and the bioreaction time (*brt*) of a single batch, *aot – st – brt*.

4.1.6. Data-driven model

The throughput of the 1 m-diameter column is calculated from the piecewise linear regression model obtained in the previous section, including Eqs. (3) and (4). As the selected bed height at chromatography step *s* is expressed as in this model, Eq. (5) is modified as follows:

Considering that the throughput of a chromatography column could be regarded as the product of the protein density, linear velocity of flow, and column area, it is assumed that the throughput is proportional to the column area. Thus, it is also proportional to the column diameter squared. In this case, the relationship between the throughput of the selected column and that of the 1 m-diameter column ( ) is formulated as follows:

where ref*DM* refers to the reference diameter, which is 100 cm in this work.

In the data-driven regression model, the mass loaded to the 1 m-diameter column is used to proportionally calculate the actual loaded mass to the selected column (*LM _{s}*).

where the loaded mass,* LM _{s}*, is defined as the product mass entering each column in each cycle, defined as follows:

4.1.7. Objective function

In this work, the objective is to minimize COG per gram, which equals the annual total cost (*AC*) divided by the annual total production output (*AP*). All cost terms included in the annual total cost calculation and the related constraints are presented in Supplementary data. The objective function is as follows:

Thus, the proposed MINLP model A minimizes Eq. (39), subject to the constraints of Eqs. (3), (4), and (7–38), and Eqs. (S1–S19) in Supplementary data.

《4.2. MILP model A*》

4.2. MILP model A^{*}

Next, the obtained MINLP model is reformulated as an MILP model for ease of computation. The new linear constraints are presented below.

4.2.1. Integer variable discretization

Similar to the work in Refs. [17–19], in order to facilitate the linearization of nonlinear terms in other constraints, the integer variables *CN _{s,i}*,

*CYN*, and

_{s}*BN*are discretized and expressed by binary variables, as shown in Eqs. (40–44).

where are binary variables introduced for discretisation of the above integer variables, and *j* and *k* are the indices of column number and cycle number, respectively.

4.2.2. Column volume linearization

Based on Eqs. (42) and (43), the nonlinear constraint Eq. (10) can be reformulated as follows:

where an auxiliary variable, , is introduced and defined, and max*TCV _{s}* is the maximum total column volume at chromatography step

*s*.

4.2.3. Annual production linearization

By introducing another auxiliary variable, , to express the bilinear term in Eq. (14), this constraint can be reformulated by the following equations [43]:

4.2.4. Product and buffer volume linearization

According to Eqs. (46) and (47), Eqs. (17) and (22) can be rewritten as the following two linear equations, respectively:

4.2.5. Processing time linearization

Eq. (27) can be reformulated to include a nonlinear term of the product of one integer variable, , and two continuous variables,* T _{s}* and

*TP*. Two auxiliary variables are introduced: and can be determined by using the following equations:

_{s}Next, is reformulated using multiparametric disaggregation [44–46], where the throughput, *TP _{s}*, is expressed as a multiparametric sum of active decimal powers determined by binary variable

*BTP*and continuous variable

_{s,d,q}*CTP*[0; 1] , where d is the digit position ranging from

_{s,q}*p*to maxp, and

*q*is the number for power

*d*.

Based on the above equations, variable is disaggregated into a number of non-negative continuous variables and :

Note that multiparametric disaggregation is a relaxation method, and the above reformulation provides a close approximation of the original equations, and generates a lower bound of the optimum.

By defining using the following constraints:

The following linear constraint replaces Eq. (29):

Using another auxiliary variable, , and the following constraints are equivalent to Eq. (30):

Based on Eq. (44), Eq. (34) can be rewritten as follows:

4.2.6. Data-driven model reformulation

The bilinear terms of the regression models in Eqs. (4) and (35) can be rewritten as the following linear constraints [47]:

Eq. (36) can be linearized using a new auxiliary variable, , and the following constraints:

Similarly, using the auxiliary variable , Eq. (37) is equivalent to:

Eq. (38) includes a nonlinear term involving two integer variables and one continuous variable, which can be linearized as follows:

where there are two auxiliary variables, , and .

4.2.7. Objective function linearization

The calculation of consumables cost in Eq. (S9) in Supplementary data involves nonlinearities. By introducing auxiliary variable , Eq. (S9) can be reformulated as follows:

where of refers to the resin overpacking factor.

max*TCV* _{s} •*Z _{n}* ;

*CS, k=*1;...;max

*CYN*;

_{s}where* rpc _{s}* and

*l*are the resin price and resin lifetime at chromatography step

_{s}*s*, respectively.

In the objective function Eq. (39), the annual cost can be expressed as the product of COG per gram and the annual production, which can be written as follows:

where auxiliary variable *COG AP*. Using multiparametric disaggregation and introducing new auxiliary variables *COG* · *BAP _{d,q}* and

*COG ·CAP*, the variables

_{q}*AP*and can be disaggregated as follows:

Thus, the reformulated MILP model A* includes the constraints shown in Eqs. (3), (7–9), (11–13), (15), (16), (18–21), (23–25), (28), (32), (33), and (40–104); and Eqs. (S1–S8) and (S10–S19) in Supplementary data, with the variable COG as the objective.

《4.3. MINLP model B》

4.3. MINLP model B

In this section, an alternative MINLP model B is introduced. By generating a set of column volume sizes, the models A and A* result in a large number of variables and equations, which hinder the computation of the proposed models. To overcome this shortcoming, in the new MINLP model B, we get rid of the discrete column volume sizes, and introduce an integer variable, *H*_{s }, for the bed height, and a binary variable, *E _{s,m }*, to indicate whether diameter size, , is selected. In addition, the subscript

*i*is removed from the chromatography column number variable, and variable expresses the number of columns at chromatography step s, which is upper bounded by maxCNs. Thus, the number of discrete variables in model B is reduced with improved computational efficiency. Based on the proposed MINLP model A presented above, a number of new variables and constraints are developed for the MINLP model B, as introduced below.

4.3.1. Column volume

In model B, the total column volume is calculated using the variables of bed height (*H*_{s }), diameter selection ( *E _{s,m }*), and number of columns ().

In addition, only one diameter size can be selected at each chromatography step.

4.3.2. Processing time

By replacing in Eqs. (27) and (29), the following constraint can be obtained:

Similarly, *BAT* and *VFR* are formulated using the new integer variable, *H*_{s }, and the binary variable, *E _{s,m }*:

4.3.3. Data-driven model

Eq. (38) can be written using the new column number variable as follows:

Similar to Eqs. (36) and (37), the throughput and loaded mass of the selected columns can be calculated from those of the 1 mdiameter columns.

4.3.4. Cost

In the cost calculation, Eq. (S13) for fixed capital investment (*FCI *) in Supplementary data should be replaced by the following nonlinear constraint:

where* lang* is the Lang factor; *gef* is the general equipment factor; *brn* is the number of bioreactors; *brc* is the cost of one bioreactor; is the ratio of other equipment cost to the bioreactor cost; is the chromatography column cost of diameter size *m* at chromatography step *s*.

Thus, the proposed MINLP model B minimizes Eq. (39), subject to the constraints shown in Eqs. (3–5), (10–26), (28), (32–34), and (105–114); and Eqs.(S1–S12) and (S14–S19)in Supplementary data.

《4.4. MILP model B*》

4.4. MILP model B^{*}

In MILP model B*, all nonlinear constraints of MINLP model B are linearized. Besides the linear constraints presented in MILP model A^{*}, the newly developed ones are given below.

4.4.1. Integer variable discretization

At first, integer variable is discretized using binary variable , which represents whether or not *j* columns are used at chromatography step *s*, as follows:

4.4.2. Column volume linearization

In order to linearize Eq. (105), an auxiliary variable, , is introduced, with the following constraints:

Therefore, Eq. (105) is reformulated to the following linear constraint:

4.4.3. Processing time linearization

According to Eqs. (61–68), the term *T _{s} ·TP_{s}* can be expressed by . Therefore, is introduced to express , with the following constraints:

Based on the introduction of auxiliary variable ·PLT and Eqs. (119) and (120), Eq. (108) can be linearized as follows:

In addition, with , the following constraints can replace Eq. (109) in the MILP model:

4.4.4. Data-driven model reformulation

Similar to Eq. (80), the data-driven model constraint given in Eq. (5) can be reformulated to linear constraints [47], as follows:

Eq. (111) is linearized to Eq. (132) by introducing auxiliary variables and .

In Eqs. (112) and (113), the nonlinearities involve the product of *E _{s,m}* and a continuous variable. With the introduction of the auxiliary variables, , the following linear equations are used instead as linear constraints:

4.4.5. Cost linearization

At last, in order to reformulate the fixed capital investment formula,can be used to linearize Eq. (114) as follows:

Overall, MILP model B* includes constraints shown in Eqs. (3), (11–13), (15), (16), (18–21), (23–25), (28), (32), (33), (42–57), (61–68), (75–79), (92–104), (106), and (115–143); and Eqs. (S1–S8), (S10–S12), and (S14–S19) in Supplementary data.

In summary, the equations of all four proposed models are summarized in Table 2.

《Table 2》

**Table 2 Model summary.**

《5. Case study》

5. Case study

In this section, the four proposed optimization models are applied to industrially relevant case studies to examine their performances. The process flowsheet of this example is shown in Fig. 1, where there are one bioreactor and three chromatography steps: affinity, cation-exchange, and anion-exchange chromatography steps. For the chromatography column-sizing decisions, the number of columns utilized at each chromatography step is limited to four, while a maximum of 10 cycles are allowed in each batch. Two cases with different alternatives of chromatography column diameter and bed height are considered. Case 1 includes 10 alternative discrete diameters varying from 50 to 200 cm and 11 alternative discrete bed height between 15 and 25 cm, while Case 2 has 26 alternative discrete diameters between 50 and 300 cm and 21 alternative discrete bed height taking integer values ranging from 10 to 30 cm. In models A and A^{*}, where discrete column volume sizes are used, there are 110 alternatives in Case 1 and 546 alternatives in Case 2. The detailed alternative column diameters and bed height are given in Table 3.

《Table 3》

**Table 3 Alternatives of chromatography column sizes in two cases.**

The chromatography resin utilization factor *μ *is 0.95. The linear velocity of the flows at the affinity and cation-exchange chromatography steps is limited between 200 and 600 cm·h^{–1} , while the linear velocity of the flows at the anion-exchange chromatography step is fixed at 300 cm·h^{–1} . Other parameters for the three chromatography steps are given in Table 4.

《Table 4》

**Table 4 Parameters of chromatographic operations.**

The *aot* of the process is 340 d, and the batch success rate is 90%. Process parameters of the non-chromatography steps are provided in Table 5. The cost-relevant data can be found in Supplementary data (Table S1).

《Table 5》

**Table 5 Parameters of non-chromatographic unit operations.**

The proposed optimization models were implemented in GAMS 24.7 [42] on a 64-bit Windows 7-based machine with an Intel Core i5-3330 3.00 GHz processor and 8.0 GB RAM, using BARON as the MINLP solver and CPLEX as the MILP solver. The central processing unit (CPU) time for each model was limited to 1 h.

《6. Results and discussion》

6. Results and discussion

The proposed models were applied to the above two cases with different column size alternatives. The computational results are presented and discussed in this section.

《6.1. Case 1》

6.1. Case 1

The model statistics and computational results of the four proposed models for Case 1 are presented in Table 6, in which all four models reach their global optimal solutions. Note that although the reported optimal objectives of all four models are the same to the first decimal place, the solutions of the MILP models are actually very close approximations of the global optima of the corresponding MINLP models with the same column sizing and key operational decisions. They are about 1 ×10^{-4} lower than the optimal objective values of the MINLP models, being lower bounds as indicated by the theory of multiparametric disaggregation. The MINLP model A is able to achieve the optimum of 201.7 GBP·g^{-1} within 494 s. After linearization, the MILP model A^{*} takes a slightly shorter time (381 s) to find the optimal solution, despite having an increased number of equations and variables in the model. Meanwhile, as shown in Table 6, the MINLP model B has the same continuous variables as the MINLP model A, but significantly reduces the number of equations and discrete variables by one order of magnitude. As a result, the model is able to find the optimal solution within 47 s. The MINLP model B^{*}, which results from linearizing the MINLP model B, involves much fewer equations and variables than the MILP model A*, and takes only 5 s to find the optimal solution, saving CPU by two orders of magnitude. From the comparison, it can be concluded that among the four proposed models, the models B and B^{*} show obvious computational advantages over the other two models. In particular, the MILP model B^{*} requires the smallest computational effort and has the most potential for larger scale problems, which also becomes evident from the larger example (Case 2) discussed in the next subsection. Note that the predication errors of the resulting piecewise linear regression models have very minor effects on the optimal objective values of the optimization models; for example, a 17% shift of the estimated throughput output results in only a 0.1% difference in the optimal objective value. A similar observation is made for Case 2.

《Table 6》

**Table 6 Statistics and computational performance of the proposed models for Case 1.**

Next, the detailed optimal solutions of Case 1 are discussed. The optimal chromatography column-sizing strategies are given in Fig. 3, where the diameters of the chromatography columns are proportional to the widths of the shapes plotted, while the bed heights are proportional to the shapes’ heights. The dashed-line shapes represent the cycles needed in each batch. At each chromatography step, only one column is utilized. The column at the affinity chromatography step has a diameter of 180 cm and a bed height of 15 cm, while the cation-exchange chromatography step uses a smaller column with a diameter of 90 cm and a bed height of 21 cm. The chromatography column at the anion-exchange chromatography step has the smallest diameter, 80 cm, but the largest bed height, 25 cm.

Here, the performance of the throughput regression models and operational decisions are examined. In the optimal solution, the throughput output of the metamodel is 1869.8 g·h^{-1} at the affinity chromatography step and 1823.9 g·h^{-1} at the cation-exchange chromatography step. As shown in Fig. 3, five and four cycles per batch are required at these two chromatography steps, respectively. The resulting product masses loaded to each column in each cycle are 5306.7 and 6301.7 g, receptively. After converting to the values for a 1 m-diameter column, the loaded mass falls into the first interval in the piecewise linear regression model at the affinity chromatography step, and the corresponding function is used to estimate the throughput, as given below:

《Fig. 3》

**Fig. 3. Optimal chromatography column-sizing strategies of Case 1. AFF: affinity chromatography; CEX: cation-exchange chromatography; AEX: anion-exchange chromatography; D: diameter; H: bed height.**

where 1.8^{2} is added to convert the performance of the 1 m-diameter column to that of the selected 1.8 m-diameter column, and , according to Eq. (37). At the cation-exchange chromatography step, the first interval in the regression model is also selected. Similarly, the regression model used is as follows:

In both of the above functions, the linear velocity variable contributes to the throughput positively; therefore, the linear velocity of the flows at both steps reaches its upper bound, 600 cm·h^{-1} .

At last, from the optimal cost distribution shown in Fig. 4, the consumables cost—that is, the resin cost in this problem—represents the largest portion, over 30%, due to the high price of the affinity resin used. Also, the capital cost of the equipment, chemical reagents (buffer and media) cost, and labor cost all make a significant contribution to the total cost.

《Fig. 4》

**Fig. 4. Optimal cost distribution of Case 1.**

《6.2. Case 2》

6.2. Case 2

In Case 2, a larger number of alternative diameter and bed height are considered, resulting in larger scale optimization models, as indicated in Table 7. Compared with Case 1, MINLP model B has slightly more discrete variables, while the other models involve an increased number of both equations and variables. It is worth noting that models A and A^{*} fail to terminate before the computational time limit of 3600 s, although the gaps to the best possible solutions are only 0.6% and 0.4%, respectively. According to the solution process of the two models shown in Fig. 5, the MILP model A^{*} finds a good feasible solution at around 220 s, and actually achieves the optimum within 10 min. However, the lower bound of the solution in the branch and bound process converges so slowly that the optimum of the obtained objective, 200.3 GBP·g^{-1} cannot be proven within the given time limit. Meanwhile, the MINLP model A is relatively much slower, only reaches the first feasible solution after around 1000 s, and obtains a good feasible solution at nearly 30 min. Compared with the models A and A^{*}, models B and B^{*} show significantly improved computational performance and achieve the optimal solutions within 4 min. The MINLP model B takes around 1 min for a close feasible solution and 192 s for the optimum. The MILP model B* achieves a CPU saving of one order of magnitude, with only 24 s for its optimum, which is also a very close lower bound of the MINLP model B* due to multiparametric disaggregation. For a good feasible solution within 1% of the optimum, only 6 s is needed for the model B*. Thus, the computational advantage of the MILP model B* is demonstrated.

《Table 7》

**Table 7 Statistics and computational performance of the proposed models for Case 2.**

^{a }Obtained solution has an optimility gap of 0.6% when the CPU limit is reached.

^{b} Obtained solution has an optimility gap of 0.4% when the CPU limit is reached.

《Fig. 5》

**Fig. 5. Solution process of the four proposed models for Case 2.**

Regarding the column-sizing decisions, Case 2 has more possible column size options. Fig. 6 shows that in comparison with the optimal decisions of Case 1, a column with a larger diameter but smaller bed height is installed at the first two chromatography steps. The selected bed heights (11 and 14 cm) are beyond the bed height range allowed in Case 1 (15–25 cm). In addition, the selected diameters (190 and 110 cm) are not available in Case 1. With more possible alternatives, the optimal solution of Case 1 is only a feasible solution of Case 2, and there is an improvement of 1.4 GBP·g^{-1} in the optimal COG per gram of Case 2. Meanwhile, the same column (80 cm diameter and 25 cm bed height) is selected at the anion-exchange chromatography step. The selected larger diameter columns result in higher cost in equipment investment, while the smaller bed heights lead to less resin and relevant cost. The differences in these costs are relatively very small; therefore, the cost distribution is very similar to that of Case 1, which is not discussed further here.

《Fig. 6》

**Fig. 6. Optimal chromatography column-sizing strategies for Case 2.**

At both the affinity and cation chromatography steps, the highest allowed flow velocity, 600 cm·h^{-1} , is implemented. The actual throughput regression models at the two steps are slightly different from those in Case 1, due to the differences in the selected diameter sizes:

As shown in Fig. 6, one more cycle is used at the affinity chromatography step than in Case 1, and therefore less mass (4422.3 g) is loaded in each cycle. However, due to the smaller bed height and larger diameter, the throughput increases to 1972.0 g·h^{-1} . For the cation-exchange chromatography step, although the same number of cycles and loaded mass as Case 1 are obtained, a higher throughput of 2760.2 g·h^{-1} is achieved, due to the chosen larger diameter and smaller bed height.

《7. Concluding remarks》

7. Concluding remarks

In this work, the multiscale optimization of an antibody manufacturing process has been investigated. At the operational level, to mimic the complex behavior of the chromatography process, datadriven models were developed to estimate the chromatography throughput, using manufacturing-scale simulated datasets based on microscale experimental data. Through a comparison of a number of methods for metamodeling, piecewise linear regression models were developed based on the simulated datasets.

At the process design level, in order to determine the optimal chromatography column-sizing strategies, two alterative MINLP models were proposed to minimize the COG per gram. Adopting linearization techniques, two MILP models were developed. The throughput regression models were incorporated into the optimization models to determine the optimal operational decisions—that is, the flow velocity and the number of cycles per batch.

By studying two industrially relevant cases with different column size alternatives, the computational performance of the four proposed optimization models were compared. In conclusion, models B and B^{*} demonstrated more efficient computational performances. In particular, the second MILP model was shown to be the most computationally efficient, and can thus be recommended for large-scale optimization studies. In addition, optimal solution details were discussed, and the data-driven models were shown to work well to achieve optimal throughput.

A future research direction of this work could be the development of data-driven metamodels for anion-exchange chromatography, and for more chromatography parameters, such as yield and binding capacity, with more input variables, such as pH value and temperature. In addition, other performance criteria of the purification process, such as impurity removal capacity, could be taken into account for multi-objective optimization [9,21,22]. The uncertainty of the parameters, such as yield and titer, could also be considered [19,22]. Finally, single-use chromatography could be considered in the purification process as an important direction in smart biopharmaceutical manufacturing [48].

《Acknowledgements》

Acknowledgements

Funding from the UK Engineering & Physical Sciences Research Council (EP/I033270/1 and EP/M027856/1) is gratefully acknowledged. The authors would like to thank Dr. Spyridon Gerontas for providing data and useful discussions, and Dr. Lingjian Yang for implementing a preliminary analysis of regression models.

《Compliance with ethics guidelines》

Compliance with ethics guidelines

Songsong Liu and Lazaros G. Papageorgiou declare that they have no conflicts of interest or financial conflicts to disclose.

《Nomenclature》

Nomenclature

*Indices*

*d* Position in multiparametric disaggregation = *p*, ..., max*p*

*i* Column volume size

*j* Column number = 1, ...,max*CN _{s}*

*k *Cycle number = 1, ...,max*CYN _{s}*

*m* Diameter size

*n* Digit of the binary representation = 1,...,

*q* Integer number in multiparametric disaggregation

*r* Interval in piecewise regression function

*s* Downstream step = ct_{1} (centrifugation 1), ho(homogenization), ct_{2} (centrifugation 2), fi(filtration), af (affinity chromatography), ce (cation exchange chromatography), uf_{1} (UF/DF 1), ae (anion exchange chromatography), uf_{2} (UF/DF 2), bf (bulk fill)

*Sets*

*CS* Set of chromatography steps = {af, ce, ae}

*Parameters*

*a, b, c* Utilities cost coefficients

*aot* Annual operating time, d

*bcv _{s}*

_{ }Buffer volume to column volume ratio at chromatography step

*s*

*bp _{s,r}* Breakpoint of loaded mass between intervals

*r*and (

*r*+ 1) at chromatography step

*s*, g

*bpc* Buffer price, GBPL^{-1}

*brc* Bioreactor cost, GBP

*brn* Number of bioreactors

*brt* Bioreaction time, d

*brv* Bioreactor volume, L

*bvr _{s}*

_{ }Buffer volume ratio at centrifugation step

*s*

*cc _{s,i }*Column cost of size

*i*at chromatography step

*s*, GBP

Column cost of diameter size *m* at chromatography step *s*, GBP

*cv*_{s,i} Volume of column size* i *at chromatography step *s*, L

*dbc _{s}* Dynamic binding capacity at chromatography step

*s*, g·L

^{-1}

*dm _{s,i}* Diameter of column size i at chromatography step

*s*, cm

Diameter of size *m* at chromatography step *s*, cm

*don* Number of operators for downstream processing

*dvr *Diafiltration volume to column volume ratio at the second UF/DF step

*ecv _{s}* Elute volume to column volume ratio at chromatography step

*s*

*el* Equipment lifetime, yeara

*fconc* Final concentration of product, g·L^{-1}

*f vr* Flush volume ratio at the first UF/DF step

*gef* General equipment factor

*gu* General utility unit cost, GBP·L^{-1}

*h _{s,i }*Bed height of column size

*i*at chromatography step

*s*, cm

*ir *Interest rate

Ratio of insurance cost to fixed capital investment

*l _{s} *Lifetime of resin at chromatography step

*s*, cycle

*lang* Lang factor

max*BBV* Maximum buffer volume per batch, L

max*BN* Maximum number of batches

max*CN _{s}* Maximum number of columns at chromatography step

*s*

max*COG* Maximum COG per gram, GBP·L^{-1}

max*CYN _{s}* Maximum number of cycles at chromatography step

*s*

max*H _{s}* Maximum column bed height size at chromatography step

*s*, cm

max*LM _{s}* Maximum product mass loaded at chromatography step

*s*, g

max*p* Maximum position in multiparametric disaggregation

max*T _{s}* Maximum processing time per batch at chromatography step

*s*, h

max*TCV _{s}* Maximum total column volume at chromatography step

*s*, L

max*TP _{s}* Maximum throughput at chromatography step

*s*, g·L

^{-1}

Maintenance cost ratio to the fixed capital investment

*mepc *Media price, GBP·L^{-1}

Miscellaneous material cost ratio to chemical reagent and consumable costs

Management cost ratio to direct labor cost

Other equipment cost ratio to the bioreactor cost

*of* Resin overpacking factor

*pr _{s} *Processing rate of step s, L·h

^{-1}

Ratio of QCQA cost to direct labor cost

*rpc _{s}* Resin price at chromatography step s, GBP·L

^{-1}

ref*CC* Reference cost of a chromatography column, GBP

ref*DM* Reference diameter of a chromatography column, cm

*sfd* Duration per shift, h

*sfn* Number of shifts per day, d^{-1}

*st* Seed train bioreaction time, d

Supervisors cost ratio to direct labor cost

*titer* Upstream product titer, g·L^{-1}

Tax cost ratio to the fixed capital investment

*uon* Number of operators per bioreactor in USP

*uot* USP operating time per day

*vel *Linear velocity of flow at the anion-exchange chromatography step, cm·h^{-1}

*w* Wage of an operator, GBP·L^{-1}

*yd _{s}* Product yield at step

*s*

Bioreactor working volume ratio

Constant coefficient in interval *r *at chromatography step *s*

Coefficient for bed height in interval *r* at chromatography step *s*

Coefficient for loaded mass in interval* r* at chromatography step* s*

Coefficient for velocity in interval *r* at chromatography step *s*

A small number to separate two successive intervals at the breakpoint *s*

Media overfill allowance

Chromatography resin utilization factor

Batch success rate

*Continuous variables*

*ABV* Annual buffer volume, L

*AP* Annual product output, g

*AT* Annual downstream processing time, d

*BAT *Time for adding buffer per batch at anion-exchange chromatography step, h

*BBV* Buffer volume added per batch, L

*BC* Buffer cost, GBP

*BRC* Bioreactor cost, GBP

*BT* Downstream processing time per batch, d

*BV _{s}* Buffer volume per batch in chromatography step

*s*, L

*CAC* Capital cost, GBP

*CC* Consumables cost, GBP

*COG* Annual cost of goods, GBP

*CRC* Chemical reagents cost, GBP

*CAP _{q}* Continuous variable for annual production in multiple disaggregation at digit

*q*

*CTP*_{s,q }Continuous variable for throughput in multiple disaggregation at digit *q*, step* s*

*DLC* Direct labor cost, GBP

*FCI* Fixed capital investment, GBP

*GUC* General utility cost, GBP

*IC* Insurance cost, GBP

*LC* Labor cost, GBP

*LM _{s}* Mass loaded to single column at chromatography step

*s*, g

Mass loaded to single 1 m-diameter column at chromatography step *s*, g

*M*_{0} Initial product mass entering downstream processes per batch, g

*M _{s}* Product mass per batch after step

*s*, g

*MAC* Maintenance cost, GBP

*MC* Management cost, GBP

*MEC* Media cost, GBP

*MIC* Miscellaneous material cost, GBP

*OIC* Other indirect costs, GBP

*PLT* Time for loading product per batch at anion-exchange chromatography step, h

*PV _{0}* Initial product volume entering downstream processes per batch, L

*PV _{s}* Product volume per batch after step

*s*, L

*QC* QCQA cost, GBP

*RV _{s}* Resin volume required at chromatography step

*s*, L

*SC* Supervisors cost, GBP

*T _{s}* Processing time per batch of step

*s*, h

*TC* Tax cost, GBP

*TCV _{s} *Total column volume at chromatography step

*s*, L

*TP _{s}* Throughput of single column at chromatography step

*s*, g·h

^{-1}

Throughput of single 1 m-diameter column at chromatography step *s*, g·h^{-1}

*UC* Utilities cost, GBP

*V _{s}* Linear velocity of flow at chromatography step

*s*, cm·h

^{-1}

*VFR* Volumetric flow rate at anion-exchange chromatography step, L·h^{-1}

*Binary variables*

*BAP _{d,q}* 1 if digit

*q*for power

*d*is selected for annual production output; 0 otherwise

*BTP _{s,d,q}* 1 if digit

*q*for power

*d*is selected for throughput at chromatography step

*s*; 0 otherwise

*E _{s,m} *1 if diameter size

*m*is selected at chromatography step

*s*; 0 otherwise

*F _{s,j}* 1 if there are j columns at chromatography step

*s*; 0 otherwise

*O _{s,r} *1 if the function at interval

*r*is selected at chromatography step

*s*; 0 otherwise

*W _{s,i,j} *1 if there are

*j*columns of size

*i*at chromatography step

*s*; 0 otherwise

*X _{s,i}* 1 if column size

*i*is selected at chromatography step

*s*; 0 otherwise

*Y _{s,k }*1 if there are

*k*cycles at chromatography step

*s*; 0 otherwise

*Z _{n}* 1 if the

*n*th digit of the binary representation of variable

*BN*is equal to 1; 0 otherwise

Integer variables

*BN* Number of completed batches

*CN _{s}*; Number of columns of size

*i*at chromatography step

*s*

Number of columns at chromatography step *s*

*CYN _{s}* Number of cycles per batch at chromatography step

*s*

*H _{s}* Bed height of column at chromatography step

*s*, cm

*Auxiliary variables*

《Appendix A. Supplementary data》

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.eng.2019.10.011.