A sensitized variable selection control chart based on a classification algorithm for monitoring high‐dimensional processes

During the last decade, variable‐selection‐based (VS) control charts have gained much popularity for process monitoring and diagnosis. These charts have been proven efficient for the detection of sparse mean shifts in high‐dimensional processes. VS charts usually assume that in‐control (IC) data are the only information used to determine the control limits. In modern industrial processes, however, out‐of‐control (OC) data can be easily collected. Detecting a specific shift in a data‐rich environment without utilizing OC data information will limit the development of a process monitoring scheme. In this paper, a novel variable selection control chart that is combined with a classification algorithm is proposed, which is expected to benefit from both the classification and variable selection approaches. In contrast to alternative charts, the proposed sensitized variable selection chart can capture the potential shifted variables using both IC and OC information, which can improve the sensitivity of the chart in a specific direction. Extensive Monte Carlo simulations demonstrate that the proposed chart outperforms the alternatives in a data‐rich and high‐dimensional environment. A real‐life example of cellular localization is also included to support the findings of our study.


INTRODUCTION
With the development of data acquisition techniques, a great amount of data in industrial processes related to quality measurements can be obtained from a range of systems. 1 Although considering various information can improve the accuracy of process monitoring and diagnosis, the complexity of the data flow becomes challenging and restricts the efficiency of quality improvement techniques. 2 For example, high-dimensional data will degrade the performance of traditional process control methods, which is usually called the "curse of dimensionality". 3In addition, diverse types of data will increase the cost of extracting information from mass data.Considering the complicated data collected from a modern industrial process, statistical approaches would benefit industrial practices immensely if information regarding monitoring and diagnosis could be processed suitably. 46][7] Its primary task is to quickly recognize potential shifts in location or scale so that process variability can be eliminated timely via corrective actions.To ensure that process control is maintained, charts can be utilized to detect shifts in the targeting process. 8The Shewhart control chart, which was the first proposed chart, can be applied to monitor univariate processes.Because some highly correlated monitoring variables are involved in industrial processes, the chart cannot be qualified to handle such situations.Even if multivariate problems could be solved by applying multiple charts in parallel, this approach is usually inefficient and unsatisfactory because it ignores the correlations between variables.To address multivariate problems, the Hotelling  2 control chart was proposed.By considering the correlations of monitored variables, the  2 chart is more efficient in detecting shifts.The phenomenon of all the monitored variables shifting all at once is difficult to observe in modern industrial process; however, the opposite situation more readily occurs.In a modern process, a few, but not all of, the variables will deviate, which is called the "sparse mean shift" phenomenon.It is challenging for traditional SPC techniques because the critical information for detecting shifts is hidden among the noise in such a scenario, which causes the  2 chart to fail.To improve the sensitivity of the scheme, 3 and 9 independently proposed variable-selection-based control charts for screening out suspicious variables in a high-dimensional process.Without prior knowledge of the potential shifted variables, these charts can easily detect a shift occurring in a minority of variables.In addition, once an alarm is triggered, the charts can provide evidences of potential shifted variables automatically. 10After that, some variable selection control charts were proposed for screening out suspicious variables in high-dimensional processes. 11raditional control charts, including variable selection charts, always assume that only IC data can be collected to calculate the control limits.This assumption indicates that the collection of OC data from the process is unnecessary.However, the fault information is hidden in the historical OC data. 12,13Neglecting the information extracted from OC data may result in decreased sensitivity in detecting a specific shift.Hence, utilizing prior knowledge from OC data in process monitoring has become a critical topic for improving the sensitivity of variable selection control charts.To take advantage of OC information, 14 developed the probability of classification (PoC) chart using available OC information from historical processes.The monitoring statistic of the PoC chart is the probability based on the logistic classification model, which indicates the similarity of an observation to abnormal patterns.Although only a portion of the OC data is used, the PoC chart can capture the fault information from the OC data to improve the sensitivity.Consequently, the PoC chart performs well when only OC data can fully describe the abnormal operation.Otherwise, it tends to fail. 5,15proposed a sensitized  2 chart ( 2 ) that can benefit from the advantages of both the  2 and logistic regression approaches.However, the overreliance on the Hotelling  2 statistic of the chart results in the monitoring of all the variables equally, which may degrade the efficiency significantly in a high-dimensional situation.
In a modern industrial process, only a few monitored variables shift at the same time, which poses the issue of sparse shift detection.Moreover, the OC data containing variability information should be utilized to improve the sensitivity of the scheme.Taking quality detection in the wafer manufacturing process as an example, the flatness of each produced wafer is an important quality index to be monitored.After being sliced from a silicon ingot, a wafer can be divided into different sites, which can be regarded as monitored variables.This is a typical sparse mean detection problem because only a few of the sites will affect the process yield.Moreover, once a defect appears at a site, the same defect may occur again.Therefore, to monitor the wafer manufacturing process effectively, abnormal information should be collected and utilized.Motivated by this example, a novel variable selection control chart is proposed to address the sparse shift detection problem in data-rich and high-dimensional processes.It is composed of two kinds of statistics.One is obtained from the variable selection algorithm, which can realize increased detection power in high-dimensional situations.The other indicates how similar an incoming observation is to the abnormal class, which can retain useful information from OC data.Different from the existing  2 chart, the proposed chart can adaptively adjust the importance of the statistics using a maximum function instead of a tuning parameter.To the best of our knowledge, this is the first study to combine variable selection and classification algorithms in the SPC domain.Benefiting from the variable selection control chart, the proposed chart can provide evidence for both process monitoring and fault diagnosis.This advantage of the chart can help reduce process variability.
This paper is organized into five sections.The novel sensitized variable selection chart is proposed in Section 2. In the next section, various scenarios are designed for evaluating the performance of the sensitized variable selection chart.In Section 4, a cellular localization example is used to demonstrate the capability of the monitoring scheme.Finally, the conclusions of this study are summarized and further research is discussed in the last section.

Traditional control chart for a high-dimensional process
The primary task of MSPC is to identify whether a mean vector  has changed.We let   = ( 1 ,  2 , … ,   ) be a pdimensional observation collected from a process at time .We assume the observed vectors   are distributed according to  (, ) for  = 1, 2, … ,  with an unknown mean vector  and consistent covariance matrix .The mean shift detection problem can be regarded as an evaluation of the following hypotheses: where Ω 0 = {} indicates the parameter set when the process is in control, and Ω 1 = { ∶  = ,  ≠ 0} indicates the parameter set when the process is out of control.Here,  is the unit vector, and ‖‖ = √  ′ Σ −1  = 1,  is an unknown constant.Therefore,  can represent a mean vector shifted in any direction.To test the hypotheses, the monitoring statistics can be derived by the generalized likelihood ratio test (GLRT) procedure. 16,17

𝐺(𝑥
where (  , ) is the likelihood function of the distribution of   .When (  ) < , the alternative hypothesis should be favored, where  > 0 is a constant setting that established a control boundary.In addition, the likelihood function of   can be expressed under the normal distribution assumption as According to the GLRT procedure, the alternative hypothesis is not accepted unless holds, where  1 is a threshold corresponding to the type I error .
We let (  ) = min ∈Ω 1 (  − )   −1 (  − ), which can be obtained as the estimator of the mean vector by the maximum likelihood algorithm.It can be easily found that the estimator  * =   is one solution of (  ). 18Meanwhile, the GLRT statistic becomes Hotelling's  2 statistic, which indicates that all the variables are monitored simultaneously.As we noted earlier, the phenomenon of all the variables changing simultaneously is difficult to observe, so  * =   is not a good choice for estimating . 9,19If a shift occurs in only a few of the monitoring variables, a penalty term should be added to estimate the mean vector naturally.Thus, (  ) can be rewritten in a penalized form as (  ) (1) = min where   (|  |) is a penalty term, which enables the estimator to describe the kinds of patterns in .For example, for   (|  |) = |  | + |  −  −1 |, the estimator of Equation ( 5) becomes an advanced statistic considering spatially correlated information. 10In this study, an extreme situation where only one or two variables shifts in the process is considered because modern manufacturing processes are relatively stable.Under this assumption, it is better to apply an  0 penalty so that the number of potential shift variables can be selected exactly.After adding the  0 penalty, the optimization problem can be expressed as follows: where  = (|  | ≠ 0) indicates the number of variables screened out.Applying the Cholesky decomposition, ∑−1 =   .Equation ( 6) is equivalent to the following model: In this model,   =   is the prediction variable, and  is regarded as the standardized observation matrix.Once an observation arrives, the optimal estimator  ⋆ can be obtained by solving the optimization problem.Actually, the  ⋆ estimation can be regarded as a variable selection procedure.The nonzero components of  ⋆ indicate the potential abnormal variables, which should be responsible for the process deviating from the normal condition.To detect a sparse mean shift, a variable-selection-based statistic has been defined to be As 20 noted, the statistic  2 is a special form of the statistic   , and they are equivalent only when all the variables are suspicious.Meanwhile, the statistic   can also be regarded as a Mahalanobis distance, which considers the sparseness property during shift detection.This procedure assumes that the covariance matrix does not change.Replacing  with the sample covariance matrix  results in the variable-selection-based statistic.

Classification-based chart for shift detection
A variable-selection-based control chart can capture suspicious variables in a high-dimensional situation; however, it only utilizes information from IC data.In a modern industrial process, once a shift occurs, the same shift tends to occur repeatedly.In other words, the use of the predefined OC information extracted from the historical process may improve the sensitivity, particularly in detecting a specific shift that occurred before.To utilize the information from both IC and predefined OC data, the  chart is designed. 14The statistic of  chart can measure the probability of an observation being assigned a specified label by a classification algorithm.Generally, there are two types of statistics in the PoC chart, PoC-in and PoC-out.Both probabilities can be applied to monitor the process.If PoC-in is used in the PoC chart, the chart will trigger an alarm when the statistic is smaller than a threshold.If PoC-out is applied, however, the probability will be larger than the limit when the process is out of control.Any classification algorithm, including logistic regression, backpropagation neural networks, k-nearest neighbors and support vector machine, can be applied to design the PoC chart only when it can predict the probability of the class.If the logistic regression model is applied, the PoC-out statistic can be obtained: where  *  is the coefficient estimated from the logistic regression model.When a new observation arrives,   () < 0.5 indicates that the observation belongs to the IC class.Conversely,   () > 0.5 implies that the predictive result provides evidence that a specific shift occurred again.In fact, the sensitivity of the chart depends on the given OC data.If the given information cannot describe the OC distribution, the PoC chart cannot detect the shift efficiently.Even though the  2 chart utilizes only IC information, it can detect shifts that occur in any direction.To benefit from the advantages of both the  2 and PoC charts, an  2 chart is developed to handle this situation. 15The following monitoring statistic is applied: where 1+  2 () measures how far the observation is away from the normal class, while   () = indicates the probability of the observation  belonging to the abnormal patterns.In addition,  is a weight that controls the information considered from the predefined OC data.Once  is determined, the  2 chart will be determined.It is challenging to determine the tuning parameter  without prior knowledge.Moreover, an overreliance on  2 will degrade the efficiency and accuracy significantly in a high-dimensional process, which should be modified further.

Proposed sensitized variable selection chart for high-dimensional and data-rich processes
Although the  2 chart can benefit from both the  2 and  statistics, it tends to fail in high-dimensional processes.In addition, determining the parameter  using the  2 chart is challenging in practice.To improve the sensitivity of the chart, a sensitized variable selection control chart is proposed, which is named the sensitized variable selection (SVS) chart.
To benefit from both variable selection and classification algorithms, two kinds of statistics can be utilized in the proposed SVS chart.One statistic   (), which indicates the distance of an incoming observation that is far away from the center of the IC data using a forward variable selection algorithm, can be obtained as follows: where the estimator  * (  |) can be calculated using Equation (7).
The other statistic   (  ) is the exponential form of the distance between the observation and classification boundary measured by the logistic regression model, which is expressed as follows: where   (  ) =  * 0 + ∑  =1  *    measures the distance between the observation and the boundary and   ( = 0, 1, 2 … , ) is the regression coefficient of the logistic model, which can be obtained by the ELM algorithm.
In Equation (11),   (  ) can be regarded as a kind of Mahalanobis distance modified by the variable selection method, which makes the estimator have a sparseness property to screen out suspicious variables.Although the forward variable selection method is employed to screening out the suspicious variables in this study, other variable selection algorithms could be applied to address different situations.It may be interesting to examine the use of different variable selection algorithms to modify the properties of the estimator.For example, LASSO can be applied to select potential variables automatically if one does not know the exact number of suspicious variables, while fused LASSO can be utilized to identify adjacent shifted variables efficiently when they are correlated spatially.
In addition, the classification algorithm-based statistic in Equation ( 12) indicates how far the observation deviates from the classification boundary in a specific direction.Different from the  2 chart, an exponential form is applied in the proposed  chart to modify the statistic so that the predefined OC information can be captured to improve the sensitivity in a specific direction without losing sensitivity in the opposite direction.It is easily observed that   (  ) > 0 when the observation belongs to the OC class and   (  ) < 0 when it belongs to the IC class.  (  ) → +∞ if the observation is far from the classification boundary in a specific direction, while   (  ) → 0 if the observation is near the boundary.Consequently,   (  ) will dominate the statistic when a shift occurs in a specific direction, especially when the observation is in the OC class.In contrast,   (  ) is disabled in other directions, and it cannot affect the performance of statistics in the opposite direction.
Naturally, one way to combine these two types of statistics is in the form of Equation (10): where  is a weight controlling how much the OC information is taken into consideration.As the value of  increases from 0 to 1, the importance of abnormal information decreases in the statistic.However, it is difficult to determine the critical parameter .To avoid this difficulty and to optimize the performance of the scheme, a sensitized variable selection control chart is proposed in this study.The monitoring statistics is designed as follows: Λ  (  ) = (  (  ),  ⋅   (  )) where  is a tuning parameter for ensuring that the two statistics   (  ) and   (  ) are on the same scale.Once  is determined, the final statistic will have a self-adaptive property to sensitively detect a specific shift.Λ  (  ) can detect outliers easily in a specific direction without losing sensitivity in other nonspecific directions.When the observation shifts in this specific direction,  ⋅   (  ) is usually larger than   (  ).The monitoring statistic Λ  (  ) becomes  ⋅   (  ), which detects the shift very sensitively by considering the OC information.Otherwise,   (  ) tends to be smaller than   (  ) when the observation shifts without a specific direction.Λ  (  ) becomes   (  ), so the robustness in detecting sparse mean shifts is maintained.Benefiting from this property, the control limits can narrow in the direction that a shift occurs.This approach becomes more effective than the charts that determined the control limits without utilizing the OC information.
The decision boundaries of the charts are plotted in Figure 1.In Figure 1, the black and red dots represent IC data and predefined OC data, respectively, while the number of suspicious variables  = 2 and the tuning parameter  = 1.If the form of Equation ( 13) is applied, the control boundaries are the dashed lines as show.It is easily observed that the boundary ( = 0.3, 0.5, 0.7) is tight in the OC direction and becomes wider in other directions, especially in the opposite direction, indicating that the sensitivity of this statistic is improved by sacrificing the detection ability in other directions.Benefiting from the OC information, the boundary of the proposed statistic(in bold) is tight in the OC direction and is equivalent to that of the variable-selection-based statistic in other directions.
Before applying the proposed  chart to monitor a process, the control limits should be calculated.It is challenging handle the theoretical distribution of the monitoring statistic Λ  (  ) because  * estimated from the variable selection algorithm is variant.In general, the control limits of the proposed chart cannot be conventionally determined without the distribution of  * .With the development of computational capabilities, however, the limits can be obtained asymptotically via the following Monte Carlo method: 1.Both IC and predefined OC data are collected to estimate the parameter sets of the IC and predefined OC distributions.2. 1000 observations are generated from both the IC and predefined OC distributions with the estimated parameters.
3. The final statistic Λ  (  ) is calculated.4. Steps (2-3) are repeated at least 5000 times and the results are recorded.5.An appropriate control limit that can guarantee that the actual ARL is close to the predetermined value is determined.

Parameters of the 𝑺𝑽𝑺 chart
Although the proposed chart can capture both IC and OC information, two parameters should be determined before the process monitoring.One is , which is defined as the number of variables to be screened out for monitoring.If prior knowledge of a specific process can be utilized to determine the parameter , the control chart can select the suspicious variables from the high-dimensional process directly.Without enough prior knowledge, various variable selection criteria can also be applied to determine the parameter , such as AIC, BIC and cross-validation. 3noted out that such criteria often produce hardly any meaningful results.However, the problem of determining the parameter  does not challenge us in practice because simultaneous changes in variables are rarely observed in the modern processes.
The other parameter  is defined as the tuning parameter, which is applied to balance the IC and OC information extracted from the process.A small in the range (0,1) should be chosen when the scale of   (  ) is larger than that of   (  ).However, a large value in the range (1, +∞) should be chosen to guarantee that the scales of   (  ) and   (  ) are comparable.It is appropriate to select  = 1 in most situations because it can help the monitoring statistic utilize the IC and OC information equally.If   (  ) >   (  ), the logistic-regression-model-based statistic dominates Λ  (  ), so OC information will be utilized more than IC information.In contrast, if   (  ) <   (  ), the variable-selection-algorithmbased statistic tends to capture more sparse information from the monitoring process.

SIMULATION STUDY
To evaluate the performance of the  chart, two alternative charts, namely, the  2 chart (S 2 ) proposed by 15 and the variable-selection-based control chart(VS-MSPC) proposed by, 9 are compared.As the author noted, the  2 chart is designed to improve the sensitivity by utilizing both IC and predefined OC data and it performs well due to its sensitive detection in a specific direction that usually occurred historically.However, the  2 chart focuses on all the monitoring variables, which may potentially weaken its performance in high-dimensional situations.High-dimensional processes are common manufacturing environments in modern industrial applications, and only a few of the variables shift simultaneously.The  −  chart can capture the sparse information in a high-dimensional process; however, it utilizes only the IC information for process monitoring, 21 ignoring the OC information.Benefitting from both variable selection and classification algorithms, the proposed  chart can detect suspicious variables from a high-dimensional process using both IC and OC information.In this section, various of situations will be discussed to demonstrate its effectiveness.
The parameters  and  must be determined before applying the proposed SVS chart.Different from the  2 chart, where the parameter  must be selected, the proposed SVS chart can utilize OC information adaptively after determining the parameter .In most scenarios, the scales of   (  ) and   (  ) are the same, so  = 1 is chosen in our simulation naively.The effect analysis of the parameter  is not our main task in this paper, so choosing  = 1, 2, 3 artificially is a simple method for comparison purposes.In the simulation, 1100 observations are generated, including 1000 IC data and 100 OC data.We assume that the 100 predefined OC data points are obtained from an abnormal condition in history and apply them to construct a logistic regression model with the 1000 IC data points.The imbalance of the IC and predefined OC training data is assigned to reflect a real situation.In addition, the 1000 IC observations are used to estimate the center of the IC data by the variable selection algorithm.In phase II of the analysis, another 1000 IC data points and 1000 OC data points (500 predefined and 500 undefined)are generated to calculate the actual  under the stable condition ( 0 ) and the abnormal condition ( 1 ).
We evaluate the performance of our proposed method by comparing it with two baseline charts: the VS-MSPC chart and  2 .For a fair comparison,  = 2 is chosen in the VS-MSPC chart, and  = 0, 0.7, 1 are selected in the  2 chart.The  2 chart degenerates into the traditional Hotelling  2 chart if  = 0, whereas it becomes the PoC chart if  = 1.
Both the robustness and sensitivity of the proposed chart are tested in phase II; hence, actual  0 and  1 are considered.We set  0 = 200.Once the actual  0 = 200, the chart remains steady, and the actual  1 makes sense.Hence, the smaller  1 is, the better the sensitivity of the chart.Moreover, all the simulations are repeated more than 5000 times to guarantee a reasonable result.

ARL comparison
To evaluate the performance of the proposed chart,  is applied.For the same  0 , a lower  1 indicates better performance.The simulation results are shown in Tables 2 to 7, and the simulation is repeated at least 5000 time for each  1 . 1 of each charts was calculated using an expected  0 = 200 under different dimensional situations.The scenario of  = 2 in Table 2 shows that the proposed  chart and  2 chart perform comparably.The  2 chart may outperform the proposed chart with  = 2 because the  2 chart can monitor all the variables simultaneously when the dimensionality is very low.As shown in Tables 3 to 7, the SVS chart tends to outperform the alternatives since the dimensionality of the variables increase because the proposed chart can capture the suspicious variables by using both IC and OC information.For the  = 5, 10, 50, 100 cases, the proposed  chart with  = 2 has the smallest  1 value compared with the alternative charts, which indicates that considering OC information helps it realize improved sensitivity.The PoC chart ( = 0) does not perform well in all the scenarios because it cannot capture any information about undefined OC TA B L E 3  1 comparisons with different magnitudes  when  = 5 and  0 = 200.The boldface entries represent the smallest  1 obtained.observations.The Hotelling  2 chart outperforms the  −  chart in low-dimensional scenarios, but it is outperformed by the  −  chart in high-dimensional situations.The expected results show that the variable selection algorithm performs better with the increasing dimensionality.An  comparison of different scenarios is also plotted in Figure 2. The most distinctive advantage of the proposed  chart is the integration of the variable selection algorithm and the classification algorithm.Consequently, it can utilize sparse information with the variable selection algorithm and use IC and OC data with the classification algorithm simultaneously.As a result, the sensitivity of the proposed chart is improved.

Description of the dataset
To demonstrate the validity in a real example, the . protein dataset from the UCI repository is applied in this section.The dataset was originally created to predict the localization sites of these proteins, which can be applied to test the classification algorithm.There are 336 samples in the dataset.For each sample, seven attribution variables can be regarded as the observations, and the protein localization site can be regarded as the prediction variable.According to the different localization sites, the dataset can be divided into eight groups (143 , 77 , 35 , 2 , 2 , 52 , 20 , and 5 ).The details can be found in the cited paper. 22In this case, the  points satisfy the normality assumption asymptotically, which can be considered IC observations. 23The , , , and  points are considered predefined OC observations, and the , , and  points are considered undefined OC observations.The multimodality of the observations is shown in Figure 3 after applying the PCA technique.

Experimental analysis of the real example
In the real example, it is better to apply the receiver operating characteristic (ROC) curve than the  to show the performance.The area under the curve is denoted as AUC index.In general, a control chart that yields the biggest AUC has the best performance.In this section, four charts, namely, the proposed  chart with parameter  = 2,  2 chart with parameter  = 0.7, Hotelling  2 chart, and VS-MSPC chart, are compared.The ROC curves of the charts are plotted in Figure 4. From Figure 4, the proposed SVS chart outperforms the alternatives because it has biggest AUC(0.9083).It is clear that the S 2 and  charts capture the information from both IC and OC data better than the  2 and  −  charts, which only utilize IC data.Furthermore, the proposed  chart outperforms the S 2 chart because it can screen out suspicious variables to improve the sensitivity.

CONCLUSIONS
Motivated by the  2 chart, a novel scheme that utilizes both IC and OC information was proposed.classification algorithms using a maximum function.Because both sparse characteristics and OC information can be considered, the false-alarm rate is expected to decrease.Both simulation and real-life examples show that the proposed  chart outperforms all the alternatives in high-dimensional and data-rich environments.In this study, the primary task was to improve the sensitivity of the variable selection chart by using OC information.The logistic regression model is one of the simplest methods for utilizing OC information.Hence, it will be interesting to extend other classification methodologies to the proposed  framework and examine the performances of variable selection charts integrated with other advanced classification algorithms.Actually, it is a challenging and interesting task to select an appropriate classification algorithm for mining the precious information in a high-dimensional and data-rich environment.Furthermore, we assumed that the observations follow a normal distribution; however, the normality assumption is usually violated in practice.We believe that the use of novel variable selection algorithms to address F I G U R E 3 A two-dimensional plot of the proteins data.

F I G U R E 4
The performance comparison in the real example.non-normal situations will become an interesting issue.The application of distribution-free variable selection algorithms to detect sparse mean shifts in high-dimensional processes will also become a research direction in our further studies.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in E. coli protein dataset at http://archive.ics.uci.edu/ml/datasets/Ecoli.

R E F E R E N C E S
engineering, and operations management.So far, he has published over 100 research papers, many of which appeared in top journals.
Yumin Liu is a full professor in the School of Business at Zhengzhou University, China.She received her PhD degree from Nankai University in 2003.Her research interests include quality engineering and corporate governance.
Zhongyuan Xin received his BSc degree in civil engineering from the University of Limerick, Ireland, and the MSc degree in environmental design and engineering from the University College London (UCL).His research interest includes Statistical process control and Distribution Theory.
This work is supported by the National Natural Science Foundation of China Grant (No. 71701188, No. 71902138, No. U1904211, No. 72261147706 and No. 72231005), Humanities and Social Sciences Research Program of the Ministry of Education of China Grant (No. 21YJC630151), Key Scientific and Technological Project of Henan Province (No. 232102211040), Natural Science Foundation of Henan Province under Grant (No. 232300420125), the Henan University of Engineering Research Fund of 2020 (No.Dsk2020002).
|−| considers the correlation between variables. = 0.6 is set which indicates a medium correlation degree of the variables in the simulated data.If the process is in control, the mean vector   = [].Otherwise, two different patterns are considered in our simulation; predefined OC data and undefined OC data.  ∼ (  , ) is assumed if the OC data have predefined patterns, while   ∼ (  , ) is assumed if the observations have undefined OC patterns.For the predefined patterns, we assume that the shift occurs only in the first two variables based on our prior knowledge, which means   = [, , 0, … , 0].However, a shift can occur in any location of two variables randomly for the undefined patterns.Among the simulations, five shift magnitudes of the scenarios are considered, namely,  ∈ {1, 2, 3, 4, 5}.A summary of the simulation experiments is shown in Table Simulation experiments design. 1 comparisons with different magnitudes  when  = 2 and  0 = 200.The boldface entries represent the smallest  1 obtained.
AbbreviationsTA B L E 5Abbreviations TA B L E 6  1 comparisons with different magnitudes  when  = 50 and  0 = 200.The boldface entries represent the smallest  1 obtained. 1 comparisons with different magnitudes  when  = 100 and  0 = 200.The boldface entries represent the smallest  1 obtained.
The proposed  chart can benefit from both variable selection and classification algorithms.It uses the variable selection algorithm to screen out potential shifted variables.Meanwhile, it utilizes a logistic regression model to extract abnormal variations from OC data.Moreover, the proposed  chart can adaptively adjust the importance of the variable selection and