The correlation relations of batch process variables are quite complex. For local abnormalities, there is a problem that the variant features are overwhelmed. In addition, batch process variables have obvious non-Gaussian distributions. In response to the above two problems, a new multiple subspace monitoring method called principal component analysis - multiple subspace support vector data description (PCA-MSSVDD) is proposed, which combines the subspace design of latent variables with the SVDD modeling method. Firstly, PCA is introduced to obtain latent variables for removing redundant information. Secondly, the subspace design result is obtained through K-means clustering. Finally, SVDD is introduced to build the monitoring model. Numerical simulation and penicillin fermentation process prove that the proposed PCA-MSSVDD method has better monitoring performance than traditional methods.

To ensure the safe and reliable operation of a batch process, it is necessary to find faults in time. Therefore, it is of great practical significance to apply process monitoring^{[1-3]}. Generally, process monitoring methods can be divided into three types: mechanism-driven approaches, knowledge-driven approaches, and data-driven approaches^{[4,5]}. Many data are recorded and stored in modern industry, and a lot of information is contained in these data, but it is not used effectively. In recent years, in view of the difficulty of establishing mechanism models in complex industrial processes and the difficulty of obtaining expert knowledge in practice, data-driven process methods have attracted a lot of attention^{[6-9]}.

In most batch process monitoring algorithms, the influence of complex correlations among process variables on monitoring effectiveness is not taken into account. In batch processes, the correlation among process variables is very complex; some variables have a strong correlation and some variables have a weak correlation. Variables with a strong correlation have a similar mutation behavior to faults, while those with a weak correlation have a different mutation behavior to faults. When a fault occurs, some process variables may mutate. For the above situation, if the monitoring is carried out in the whole monitoring space, there will be the risk of submergence of mutation features, thus increasing the difficulty of fault detection. In view of the complex correlation among process variables, many monitoring algorithms based on variable subspace have been studied in recent years^{[10-14]}. These algorithms place variables with similar characteristics in a subspace and monitor them, highlighting the local characteristics of process variables. If some variables have mutation characteristics, the mutation characteristics will be more obvious in the subspace than in the whole space, which is conducive to detecting the fault. Meanwhile, meaningful subspace design is conducive to process understanding and learning. In addition, a monitoring method based on the subspace of latent variables is proposed by principal component analysis (PCA). This method can eliminate redundant information in the original process variables through PCA mapping^{[15]}.

The above algorithm based on latent variable subspace design can reduce the risk of local variation characteristics being inundated; however, the calculation method of the control limit of the model still has the assumption that the data need to obey a Gaussian distribution. Since batch process data have obvious non-Gaussian characteristics, the fault detection ability of this algorithm is sometimes reduced. Support vector data description (SVDD) can adapt to the non-Gaussian features of the data^{[16,17]}. To distinguish normal data samples from abnormal data samples, which is the purpose of statistical process monitoring, all normal samples can be used as a category to establish an SVDD monitoring model. Multiple subspaces SVDD (MSSVDD) methods have been proposed by fusing variable subspace design methods with SVDD and applied to non-Gaussian processes^{[18]}. However, the application of subspace monitoring methods for latent variables in non-Gaussian processes still has not been studied.

Aiming at the complex correlation and the non-Gaussian distribution among batch process variables, this paper proposes a batch process monitoring algorithm called PCA-MSSVDD, which combines latent variable subspace design with SVDD. In offline modeling, firstly, the three-dimensional matrix of the batch process is expanded and converted into a two-dimensional matrix by twice expansion technology, and the original variable is converted into a latent variable by PCA transformation, eliminating redundant information. The extension matrix is defined according to the PCA transformation matrix, i.e., the load matrix. The vector of the extension matrix can reflect the influence of process variables on latent variables, and it is defined as the characteristic vector of latent variables. When the process variables which have important influence on latent variables change, the corresponding latent variables will also have the variation characteristics. Therefore, latent variables with similar eigenvectors have similar variation characteristics and should be monitored in the same subspace. K-means is introduced to cluster eigenvectors, and the clustering result is the design result of the latent variables subspace. Then, the latent variable time slice matrix is obtained by the sliding time window technique, and the latent variable subspace design results are applied to the latent variable time slice matrix. Finally, SVDD monitoring model is established based on latent variable subspace data. When online monitoring, PCA mapping of online samples is carried out to obtain the latent variables of online samples. Then, the monitoring model is selected by time. Finally, the weighted average strategy is used to fuse the monitoring results of subspace as the final monitoring results. The effective fusion of latent variable subspace design and SVDD can effectively improve the efficiency of fault detection.

The remainder of this paper is structured as follows.

SVDD is a monitoring algorithm based on pattern recognition. Samples are projected into the feature space by mapping, and a minimum hypersphere is found in the feature space. Under the condition of minimum structural risk, the sample data are surrounded as much as possible^{[19]}. The optimal objective function of SVDD is:

where ^{2} represents the square of the radius, _{i} represents the relaxation variable, and Φ represents the kernel function. The objective function can also be formed as:

Where _{1}(_{i},_{j}) = <Φ(_{i}),Φ(_{j})>.

By calculating the above objective functions, a set of vectors _{i} and corresponding coefficients _{i} can be obtained. If _{i} > 0, then the corresponding vectors are defined as support vector (SV). For all support vectors

The square of the distance of online sample

Therefore, the SVDD method can be used to distinguish between normal and abnormal samples. In this paper, the calculation formula based on SVDD statistics is defined as:

Meanwhile, the corresponding control limit is _{lim} = 1.

K-means^{[20]} uses Euclidean distance as the similarity evaluation index, that is, the closer are the two samples, the higher is their similarity. The clustering algorithm considers that each class of samples is composed of samples with close Euclidean distance, so the minimum sum of squares of errors from each class of samples to the center of the sample is taken as the objective function of clustering.

The K-means clustering algorithm clusters column variables of a two-dimensional matrix _{1},_{2},…,_{m} should belong to _{1},_{2},…,_{C}. _{c} corresponds to the eighth category represented by 9, where _{c}, where

(1) Firstly, given the order of iteration _{1},_{2},…,_{C} are randomly selected in column vectors of

(2) Between the

(3) The sample of the same category _{c} (

(4) The center _{c} (_{c} (

(5) If _{c} (_{c} (

In this paper, the formula of the matrix column vector clustering is expressed as follows:

When the K-means clustering algorithm is applied, it is necessary to determine the number of classifications. When the number of classifications is known, K-means can re-classify unreasonably classified samples through its own optimization iteration steps. Therefore, in the case of a small number of data samples, it can achieve satisfactory results. In the case of an uncertain number of classifications, it is necessary to determine the number of classifications by other analysis methods.

The two-dimensional matrix ^{[21]}:

Where

Where λ_{i} (

Where _{c} represents the number of latent variables retained. Therefore, when there is a lot of redundant information in the original data, it should be _{c} ≤

Formula

Based on the data transformation matrix, i.e. the load matrix, an extension matrix

Where _{j} is the _{i,j} is the _{j}. The size of the numerical value indicates the significance of the latent variable characteristics, and the larger is the numerical value, the more obvious are the characteristics. The column vectors of the extension matrix _{c} × 1) represents the results of clustering. The expression of clustering results is as follows:

According to the results of

Where _{j} is the _{c} is the latent variable matrix of the

Latent variable subspace design flow diagram based on PCA-K-means. PCA: Principal component analysis.

The batch process historical data comprise a three-dimensional data matrix _{B} (_{B} (_{V} (_{V} (_{V} (_{c} represents the monitoring results of the

The flow chart of PCA-MSSVDD is shown in

Batch process monitoring flow diagram based on PCA-MSSVDD. PCA-MSSVDD: Principal component analysis - multiple subspace support vector data description; SVDD: support vector data description; PCA: principal component analysis.

The three-dimensional matrix is transformed into the two-dimensional matrix of variable expansion by two expansion techniques.

PCA transform is applied to the two-dimensional matrix of variable expansion.

The sliding time window technique is used to obtain the time slice matrix for the latent variable of the two-dimensional matrix.

The subspace of the latent variable two-dimensional matrix is designed to obtain the results of subspace design.

The SVDD monitoring model for the subspace matrix of latent variable time slice is established.

The latent variables of online samples are obtained by transforming the online samples according to the second PCA model.

The online sample latent variables select the monitoring model according to time.

The subspace monitoring results are fused.

This paper designs the following numerical simulation models with two subsystems:

Here, _{i,j} is the _{i} is the _{i,j} is the

The gray scale diagram of the extension indicator matrix of the load matrix in the numerical simulation process is shown in _{1}, while Latent Variables 2, 3, and 5-10 form subspace _{2}.

Gray schematic diagram of the denotative matrix in the numerical process.

Latent variable subspace design result in the numerical process

Subspace number | Latent variable number |
---|---|

_{1} |
1, 4, 11, 12, 13, 14, 15, 16 |

_{2} |
2, 3, 5, 6, 7, 8, 9, 10 |

Based on the numerical simulation model, a cross-subsystem local fault is designed, that is, fault signals are introduced into both subsystems to analyze and compare the monitoring performance of SVDD, MSSVDD, and PCA-MSSVDD. The failure results are shown in

The parameters of the test case in the numerical process

Variable number of the fault signal is introduced | Size of step fault | Time of fault |
---|---|---|

1 | 16 | 201-800 |

13 | 3 | 201-800 |

15 | 4 | 201-800 |

The comparison of SVDD, MSSVDD, and PCA-MSSVDD in monitoring the numerical process is shown in

The comparison of SVDD, MSSVDD, and PCA-MSSVDD in monitoring the numerical process

SVDD | MSSVDD | PCA-MSSVDD | |
---|---|---|---|

False alarm rate | 0 | 0 | 1.5 |

Missed alarm rate | 47.0 | 68.5 | 15.3 |

Error rate | 35.3 | 51.5 | 11.8 |

First time to detect fault | 201 | 201 | 201 |

SVDD: Support vector data description; MSSVDD: multiple subspaces SVDD; PCA-MSSVDD: principal component analysis - multiple subspace support vector data description.

The comparison charts of SVDD, MSSVDD, and PCA-MSSVDD for the test case in monitoring the numerical process is shown in

The comparison charts of SVDD, MSSVDD, and PCA-MSSVDD for the test case in monitoring the numerical process. SVDD: Support vector data description; MSSVDD: multiple subspaces SVDD; PCA-MSSVDD: principal component analysis - multiple subspace support vector data description.

The comparison charts of the PCA-MSSVDD subspace for the text case in monitoring the numerical process is shown in

The comparison charts of the PCA-MSSVDD subspace for the text case in monitoring the numerical process. PCA-MSSVDD: Principal component analysis - multiple subspace support vector data description.

The simulation model of the penicillin fermentation process is designed to provide a standard testing platform for data-driven batch process monitoring methods^{[22]}. Under the normal state set value of process variables, the production cycle of the penicillin fermentation process is set to 400 h, data are recorded once every 0.5 g, and 800 sampling data can be recorded by one simulation^{[23]}. There is random noise in the simulation model. Under the same initial set value, the data between different batches fluctuate randomly. Therefore, 100 batches of simulation data are collected as a historical reference database.

The gray scale diagram of the extension indicator matrix of the load matrix retained during penicillin fermentation is shown in _{1}, while Hidden Variables 3-9 form subspace _{2}. As shown in

Gray schematic diagram of the denotative matrix in the penicillin fermentation process.

Hidden variable subspace design result in the penicillin fermentation process

Subspace number | Latent variable number |
---|---|

_{1} |
1, 2, 11, 12, 13, 14 |

_{2} |
3, 4, 5, 6, 7, 8, 9,10 |

The simulation of the penicillin fermentation process provides three types of faults: (1) the fault of the ventilation rate variable; (2) the fault of the stirring power variable; and (3) the fault of the glucose flow rate variable. In this paper, six test faults are designed through the simulation test platform to simulate abnormal operation behavior in actual production. The size and types of faults used to test the monitoring algorithm are shown in

Test cases of the fed-batch penicillin fermentation process

Fault serial number | Failure variable | Variable number | Fault type | Size | Time to detect fault (h) |
---|---|---|---|---|---|

1 | Ventilation rate | 1 | Step | -1.5(%) | 100-400 |

2 | Stirring power | 2 | Step | -1.5(%) | 100-400 |

3 | Glucose flow rate | 3 | Step | -2.0(%) | 100-400 |

4 | Ventilation rate | 1 | Slope | -0.1 | 100-400 |

5 | Stirring power | 2 | Slope | -0.1 | 100-400 |

6 | Glucose flow rate | 3 | Slope | -0.001 | 100-400 |

The above six kinds of faults are used for monitoring and comparing SVDD, MSSD, and PCA-MASVDD. The comparison of the false alarm rate, missed alarm rate, error rate, and first time to detect fault using SVDD, MSSVDD, and PCA-MSSVDD in monitoring the penicillin fermentation process are shown in

The comparison of the false alarm rate, missed alarm rate, error rate, and first time to detect fault using SVDD, MSSVDD, and PCA-MSSVDD in monitoring the penicillin fermentation process

Fault serial number | SVDD | MSSVDD | PCA-MSSVDD | |
---|---|---|---|---|

False alarm rate | 1 | 0 | 0 | 0.5 |

2 | 0.5 | 0 | 2.5 | |

3 | 1.5 | 0.5 | 2.0 | |

4 | 0 | 0.5 | 0 | |

5 | 0 | 0 | 0 | |

6 | 0 | 0 | 0 | |

Missed alarm rate | 1 | 71.7 | 25.4 | 27.6 |

2 | 70.7 | 39.9 | 39.9 | |

3 | 24.4 | 33.2 | 33.2 | |

4 | 51.9 | 38.1 | 42.2 | |

5 | 98.8 | 94.0 | 94.5 | |

6 | 22.4 | 23.1 | 27.9 | |

Error rate | 1 | 53.8 | 19.1 | 20.8 |

2 | 53.2 | 30.0 | 30.6 | |

3 | 18.7 | 25.1 | 25.5 | |

4 | 39.0 | 28.7 | 31.7 | |

5 | 74.2 | 70.6 | 71.0 | |

6 | 16.8 | 17.3 | 21.0 | |

First time to detect fault | 1 | 122 | 100 | 100 |

2 | 111 | 101 | 105 | |

3 | 142 | 137 | 121 | |

4 | 141 | 138 | 125 | |

5 | 397 | 171 | 106 | |

6 | 164 | 159 | 161 |

SVDD: Support vector data description; MSSVDD: multiple subspaces SVDD; PCA-MSSVDD: principal component analysis - multiple subspace support vector data description.

The comparison charts of SVDD, MSSVDD, and PCA-MSSVDD for Fault 3 in monitoring the penicillin fermentation process are shown in

The comparison charts of SVDD, MSSVDD, and PCA-MSSVDD for Fault 3 in monitoring the penicillin fermentation process. SVDD: Support vector data description; MSSVDD: multiple subspaces SVDD; PCA-MSSVDD: principal component analysis - multiple subspace support vector data description.

The comparison charts of SVDD, MSSVDD, and PCA-MSSVDD for Case 6 in monitoring the penicillin fermentation process are shown in

The comparison charts of SVDD, MSSVDD, and PCA-MSSVDD for Case 6 in monitoring the penicillin fermentation process. SVDD: Support vector data description; MSSVDD, multiple subspaces SVDD; PCA-MSSVDD: principal component analysis - multiple subspace support vector data description.

Combining the above two monitoring comparison charts,

The comparison charts of PCA-MSSVDD subspace for Faults 1-6 in monitoring the penicillin fermentation process are shown in

The comparison charts of the PCA-MSSVDD subspace for Fault 1 in monitoring the penicillin fermentation process. PCA-MSSVDD: Principal component analysis - multiple subspace support vector data description.

The comparison charts of the PCA-MSSVDD subspace for Fault 2 in monitoring the penicillin fermentation process. PCA-MSSVDD: Principal component analysis - multiple subspace support vector data description.

The comparison charts of the PCA-MSSVDD subspace for Fault 3 in monitoring the penicillin fermentation process. PCA-MSSVDD: Principal component analysis - multiple subspace support vector data description.

The comparison charts of the PCA-MSSVDD subspace for Fault 4 in monitoring the penicillin fermentation process. PCA-MSSVDD: Principal component analysis - multiple subspace support vector data description.

The comparison charts of the PCA-MSSVDD subspace for Fault 5 in monitoring the penicillin fermentation process. PCA-MSSVDD: Principal component analysis - multiple subspace support vector data description.

The comparison charts of the PCA-MSSVDD subspace for Fault 6 in monitoring the penicillin fermentation process. PCA-MSSVDD: Principal component analysis - multiple subspace support vector data description.

Using the above six test faults of the penicillin fermentation process simulation, the comparison of the false alarm rate of the penicillin fermentation process monitoring based on multi-way principal component analysis (MPCA)^{[24]}, multi-way independent component analysis (MICA)^{[25]}, batch dynamic principal component analysis (BDPCA)^{[26]}, mixture probabilistic principal component analysis (MPPCA)^{[27]}, and PCA-MSSVDD is shown in

The comparison of the false alarm rate using MPCA, MICA, BDPCA, MPPCA, and PCA-MSSVDD in monitoring the penicillin fermentation process

Fault serial number | MPCA | MICA | BDPCA | MPPCA | PCA-MSSVDD | ||||
---|---|---|---|---|---|---|---|---|---|

^{2} |
^{2} |
^{2} |
^{2} |
||||||

1 | 0 | 0.5 | 3.5 | 1.0 | 1.0 | 6.5 | 0 | 0 | 0.5 |

2 | 0 | 12.0 | 1.5 | 0 | 2.5 | 18.5 | 2.5 | 0 | 2.5 |

3 | 0 | 5.0 | 3.5 | 17.0 | 0.5 | 11.5 | 0.5 | 0.5 | 2.0 |

4 | 0 | 0 | 1.5 | 0.5 | 0 | 6.5 | 0 | 0 | 0 |

5 | 0 | 0 | 1.5 | 0 | 0 | 7.0 | 0 | 0 | 0 |

6 | 0 | 0 | 2.0 | 0 | 0 | 5.5 | 0 | 0 | 0 |

MPCA: Multi-way principal component analysis; MICA: multi-way independent component analysis; BDPCA: batch dynamic principal component analysis; MPPCA: mixture probabilistic principal component analysis; PCA-MSSVDD: principal component analysis - multiple subspace support vector data description.

The comparison of the missed alarm rate using MPCA, MICA, BDPCA, MPPCA, and PCA-MSSVDD in the monitoring the penicillin fermentation process

Fault serial number | MPCA | MICA | BDPCA | MPPCA | PCA-MSSVDD | ||||
---|---|---|---|---|---|---|---|---|---|

^{2} |
^{2} |
^{2} |
^{2} |
||||||

1 | 100 | 54.2 | 99.0 | 40.5 | 33.6 | 71.5 | 48.9 | 95.1 | 27.6 |

2 | 100 | 68.2 | 98.6 | 88.8 | 44.8 | 72.1 | 75.2 | 95.8 | 39.9 |

3 | 88.3 | 51.5 | 92.8 | 97.5 | 49.0 | 77.6 | 42.5 | 42.5 | 33.2 |

4 | 100 | 48.4 | 98.1 | 39.9 | 47.1 | 66.3 | 47.5 | 96.5 | 42.2 |

5 | 100 | 96.6 | 99.1 | 86.8 | 98.3 | 91.0 | 98.6 | 99.3 | 94.5 |

6 | 77.8 | 25.4 | 86.0 | 90.3 | 31.8 | 76.8 | 23.2 | 22.1 | 27.9 |

MPCA: Multi-way principal component analysis; MICA: multi-way independent component analysis; BDPCA: batch dynamic principal component analysis; MPPCA: mixture probabilistic principal component analysis; PCA-MSSVDD: principal component analysis - multiple subspace support vector data description.

The comparison of the error rate using MPCA, MICA, BDPCA, MPPCA, and PCA-MSSVDD in monitoring the penicillin fermentation process

Fault serial number | MPCA | MICA | BDPCA | MPPCA | PCA-MSSVDD | ||||
---|---|---|---|---|---|---|---|---|---|

^{2} |
^{2} |
^{2} |
^{2} |
||||||

1 | 75.1 | 40.8 | 75.2 | 30.7 | 25.5 | 55.4 | 36.7 | 71.5 | 20.8 |

2 | 75.1 | 54.2 | 74.5 | 66.7 | 34.3 | 58.9 | 57.1 | 72.0 | 30.6 |

3 | 66.3 | 40.0 | 70.6 | 77.5 | 36.9 | 61.2 | 32.1 | 32.1 | 25.5 |

4 | 75.1 | 36.3 | 74.1 | 30.1 | 35.4 | 51.5 | 35.7 | 72.5 | 31.7 |

5 | 75.1 | 72.6 | 74.8 | 65.2 | 73.9 | 70.2 | 74.1 | 74.6 | 71.0 |

6 | 58.5 | 19.1 | 65.1 | 67.8 | 23.9 | 59.1 | 17.5 | 16.6 | 21 |

MPCA: Multi-way principal component analysis; MICA: multi-way independent component analysis; BDPCA: batch dynamic principal component analysis; MPPCA: mixture probabilistic principal component analysis; PCA-MSSVDD: principal component analysis - multiple subspace support vector data description.

The comparison of the first time to detect fault using MPCA, MICA, BDPCA, MPPCA, and PCA-MSSVDD in monitoring the penicillin fermentation process

Fault serial number | MPCA | MICA | BDPCA | MPPCA | PCA-MSSVDD | ||||
---|---|---|---|---|---|---|---|---|---|

^{2} |
^{2} |
^{2} |
^{2} |
||||||

1 | / | 100 | 397 | 100 | 100 | 100 | 100 | 127 | 100 |

2 | 111 | 101 | 105 | 101 | 104 | 101 | 100 | 111 | 105 |

3 | 142 | 137 | 121 | 137 | 106 | 106 | 104 | 142 | 121 |

4 | / | 142 | 395 | 163 | 166 | 112 | 142 | 352 | 125 |

5 | 397 | 171 | 106 | 102 | 102 | 108 | 102 | 397 | 106 |

6 | 334 | 164 | 328 | 194 | 171 | 148 | 167 | 166 | 161 |

In this paper, a batch process monitoring algorithm based on PCA-MSSVDD is proposed by combining latent variable subspace design with SVDD. Subspace monitoring by PCA and K-means can effectively reduce the risk of inundation of variation features; using SVDD to establish subspace monitoring model can make the proposed method applicable to any non-Gaussian process.

Through the numerical simulation process and penicillin fermentation simulation process test, the comparison results between PCA-MSSVDD and SVDD show that the subspace monitoring algorithm can effectively reduce the risk of variation characteristics being submerged and improve the monitoring performance. The comparison results between PCA-MSSVDD and MSSVDD show that the fault detection capability of PCA-MSSVDD may be higher than that of MSSVDD or lower than that of MSSVDD. For local failures of weakly correlated variables, the proposed PCA-MSSVDD method will have better results, while, for strongly correlated variables, the MSSVDD method will have better results, and both methods have better performance than SVDD.

The author contributed solely to the article.

Not applicable.

Opening Project of Shanghai Trusted Industrial Control Platform (TICPSH202103003-ZC).

The author declared that there are no conflicts of interest.

Not applicable.

Not applicable.

© The Author(s) 2021.