Three‐level designs: Evaluation and comparison for screening purposes

Since their introduction by Box and Hunter, resolution criteria have been widely used when comparing regular fractional factorials designs. In this article, we investigate how a generalized resolution criterion can be used to assess some recently developed three‐level screening designs, such as definitive screening designs (DSDs) and screening designs from weighing matrices. The aim of this paper is to capture the projection properties of those three‐level screening designs, complementing the work of Deng and Tang, who used generalized resolution and minimum aberration criteria for ranking different two‐level designs, particularly Plackett‐Burman and other nonregular factorial designs. An advantage of generalized resolution, extended here to work on three‐level designs, is that it offers a useful criterion for ranking three‐level screening designs, whereas the Deng and Tang resolution is used mainly for the assessment of two‐level designs. In addition, we applied a projection estimation capacity (PEC) criterion to select three‐level screening designs with desirable properties. Practical examples and the best projections of the designs are presented in tables.


| INTRODUCTION
Full factorial designs provide independent estimation for all factorial effects. The high cost of performing a full factorial is the reason that fractional factorial designs, which are subsets of full factorial designs, are used. Thus, fractional factorial designs are widely used in industrial settings to identify the most active factors that affect responses or processes. Orthogonality of two-level designs occurs when the number of +'s and −'s in each design column is equal and the four-level combinations (+ +), (− −), (+ −), and (− +), for each pair of factors, have the same frequency.
In this paper, we capture the projection properties of fractional factorial designs with three levels, where we use − as low, 0 as intermediate, and + as high levels, respectively, according to Montgomery. 1 Orthogonal fractional factorial designs and screening designs are commonly used in industrial research, bio-medical engineering, drug discovery, computer simulation experiments, and machine learning (see Dean and Lewis 2 ). One of the desirable properties of screening designs is that the number of runs is limited because screening designs aim in identifying the most important factors from a large number of factors that may affect a response. Thus, it is important to investigate the projection properties of these designs. The projected subdesigns will have a small subset of factors and will be generated and studied because it is not known which factors will be active. When the active factors are identified, designs with good projections can reveal additional valuable information after the set of non-active factors is deleted.
Fractional factorial designs are categorized as either regular or nonregular. Defining relations are used for a regular fractional factorial design that has a simple aliasing structure so any two effects are orthogonal or fully aliased. A nonregular fractional factorial design has a more complicated aliasing structure, which means it is difficult to interpret the significance of some effects that are neither orthogonal nor fully aliased. The Plackett-Burman 3 design is an example of a nonregular fractional factorial design that is extensively used in screening experiments based on its flexible and economic run size (see Wu and Hamada 4 ). Consequently, nonregular factorials have received more attention in the past decade. More work on two-level fractional factorial designs and their projection properties can be found on Hamada and Wu 5 ; Lin and Draper 6 ; Wang and Wu 7 and Cheng. 8,9 Deng and Tang 10 and Tang and Deng 11 studied Plackett-Burman and other nonregular factorial designs. They provided two criteria -generalized resolution and generalized minimum aberration -for ranking two-level nonregular designs.
There is less literature on studying the projection properties of designs with more than two levels. Only few researchers have investigated the projection properties of designs having more than two levels. For instance, Wang and Wu 7 and Cheng and Wu 12 worked on the hidden projection properties when the designs were projected onto three and four factors using the OA (18, 3 7 , 2) design. Cheng and Wu 12 also included the orthogonal array OA (36, 3 12 , 2) and OA (27, 3 8 , 2) in their projection investigation. For more studies on projection properties of three-level designs, we refer to Xu, Cheng, and Wu 13 ; Tsai, Gilmour, and Mead 14,15 ; Evangelaras, Koukouvinos, and Lappas 16 ;and Dey. 17 The problem of assessing three-level factorial designs, especially screening designs, should now get more attention as more screening designs with three levels appear in the literature. Deng and Tang 10 used generalized resolution and minimum aberration criteria for comparing and ranking screening designs. The criteria we use in this paper are natural generalizations of the criteria they applied to two-level non-regular factorials.
Following Deng and Tang, 10 a factorial design, regular or nonregular, is denoted by D and is regarded as a set of m columns D = {d 1 , ……,d m } or as an n × m matrix D = (d ij ), depending on our preference. For 1 ≤ r ≤ m and any r-subset T = fd j 1 , :: …, d j r g of D, define Clearly, Deng and Tang 10 illustrated that for two-level orthogonal designs, J 1 (T) = J 2 (T) = 0. The value of J r (T) can be used to develop the generalized resolution and minimum aberrationcriteria.
The paper is organized as follows. In Section 2, we recall the construction of some definitive screening designs (DSDs) that Jones and Nachtsheim 18 suggested. These are presented in Table 1. In the same section, we present some alternative three-level designs that were recently appear in the literature. These use weighing matrices and a fold-over structure for their constructions. In Section 3, we introduce generalized resolution, and some examples T A B L E 1 Jones and Nachtsheim definitive screening designs (DSDs) structure for m factors of how this can be applied for design comparisons. The results on designs evaluation are presented and discussed. In Section 4, we define the confounding frequency vector (CFV) of a design based on the generalized minimum aberration criterion to discriminate between designs. In Section 6, we introduce the projection estimation capacity (PEC) criterion and we use some examples to select three-level screening designs with desirable properties.

| CONSTRUCTION OF THREE-LEVEL SCREENING DESIGNS
One of the advantages of DSDs is their small number of runs. For example, a design with four or more factors (m ≥ 4) requires just twice as many runs (n) plus one (i.e., n = 2m + 1) (see Jones and Naschtsheim 18 ). In this article, the focus is on three-level screening designs, and we include the DSDs, WSD-2, and WSD-3 as their constructions are described below. DSDs can be constructed as suggested by Jones and Nachtsheim 18 : where C is an m × m matrix and 0 is a 1 × m zero matrix. D will have 2m + s runs. The construction of DSDs was introduced with s = 1 and can be considered as a special case of designs constructed from weighing matrices with 1 zero per row and column (conference matrix). W (6,5) is an example of a conference matrix (see Xiao et al. 19 ) and can be used to generate a DSD of m = 6 factors.
Moreover, a weighing matrix W = W(m,w) is a square matrix of order m with entries from the set {0, ±1} satisfies W W T = W T W = wI n , where m is the number of nonzero entries per row and column. Parameter w is called the weight of W. WSD-s for s = 1,2, and 3 used by Georgiou et al. 20 and were constructed by using a fold-over were structure with a W(m,m − s) and by adding a selected number of center points. For example, the WSD-2 and WSD-3 can be constructed as  The W (8,6) and W (8,5) are used to construct a WSD-2 and WSD-3, respectively: For more details on the constructions and properties of WSD s , we referred to Georgiou et al. 20

| A GENERALIZED RESOLUTION CRITERION FOR THREE-LEVEL SCREENING DESIGNS
There are two equivalent methods for performing the resolution of a regular fractional factorial design. From the projection viewpoint, if all the possible level combinations 2 r − 1 in the projection design onto any (r − 1) factors occur with the same frequency, then we say that this regular factorial has resolution r. From the estimability viewpoint, there is the assumption that the interaction effects including (r + 1)/2 or more factors are negligible, a regular factorial has resolution r if (r odd), and the interaction effects including (r − 1)/2 or fewer factors can be estimated. In contrast, under the assumption that the interaction effects including (r + 2)/2 or more factors are negligible, a regular factorial has resolution r if (r even), and the interaction effects including (r − 2)/2 or fewer factors are estimable (for more information, see Box and Hunter 21 ).
By using the estimability viewpoint, the resolution can be generalized to any factorial design. Webb 22 gave the definition of resolution that was later applied by Rechtschaffner 23 and Srivastava and Chopra 24 to construct useful designs. However, Deng and Tang 10 noted that this method can only be used as a classification rule, so it is not useful for ranking different designs. For instance, a resolution V design may be less efficient for estimating the main effects than a resolution III design when the experimental error is substantial.
Deng and Tang 10 went beyond that and defined generalized resolution and generalized minimum aberration for ranking designs. However, their method is only applicable to two-level factorial designs. In this paper, we extend the Deng and Tang 10 method, and we apply it to rank three-level screening designs.

| Calculating generalized resolution for screening designs
In this section, we apply the generalized resolution criterion to several three-level designs to evaluate and compare their performance under different models. We evaluate the designs using models that include just the main effects or main effects and interactions: Model 2 (main effects and interactions) where y is the response vector and x i and x ij = x i x j are the columns that correspond to the main effects and twofactor interactions, respectively. β 0 , β 1 , and β ij are unknown constant coefficients corresponding to the intercept, main effects, and two-factor interactions, respectively, whereas ϵ is the error vector with components ϵ j being i.i.d. N(0,σ 2 ). For a screening designs D, r will be the smallest integer that achieves max |T|=r J r (T) > 0, where J r is defined as mentioned previously (1), and the maximization will be for all possible r subsets of the design columns.
The generalized resolution criterion will be taken from Deng and Tang 10 : Clearly, r ≤ R(D) < r + 1. Designs with higher generalized resolution are better than those with lower generalized resolution.
The generalized resolution of various examples are presented in Tables 3, 5, 7, and 8. In these tables, we have five columns where n is the number of runs, Design is the type of design, max J r is the maximum value of the j characteristics among all subsets T of r distinct from columns of D, R(D) is the generalized resolution, and Rep column include some representatives of the columns that constitute the projections that give the corresponding results.
T A B L E 2 Orthogonal designs used for constructing 12 factors screening designs definitive screening design (DSD), WSD-2, and WSD-3 For these three-level screening designs, J 1 (T) = J 2 (T) = J 3 (T) = 0. The value of R(D) can be a useful criterion to determine the projection properties when projecting onto r dimensions. Desirable designs have better projection properties when R(D) is close to r + 1. To clarify, as shown in Table 3, generalized resolution is useful for assessing the screening designs based on the value of max J r and R(D). Looking at the results, we could say that DSD with max |T|=4 J 4 (T) = 8 has better generalized resolution R(D) = 4.68 than the others. Designs WSD-2 and WSD-3 have max |T|=4 J 4 (T) = 16 and 10, respectively, and lower generalized resolution R(D), which is 4.36 and 4.6, respectively. However, in some cases, generalized resolution cannot discriminate between designs. When there are two or more designs with the same R(D), as we illustrate in Example 2, then the minimum aberration criterion is used to distinguish the quality of the designs.

Example 2
In this example, we generate and study the DSD, WSD-2, and WSD-3 as their construction was illustrated in the previous section. These designs have at three levels, eight factors, and 17 runs. The orthogonal designs that are used, in the fold-over structure, to construct them are shown in Table 4.

| MINIMUM ABERRATION CRITERION FOR THREE-LEVEL SCREENING DESIGNS
The minimum aberration criterion (MA) was introduced by Fries and Hunter 25 for ranking regular two-level designs. The definition of MA is only suitable for regular designs. The generators of regular design are used to calculate the MA criterion and that is the main reason that MA is not suitable for nonregular designs. Deng and Tang  The MA criterion can be applied to compare two-regular designs with the same resolution. Assume the word length patterns of two-regular designs D 1 and D 2 , respectively are W(D 1 ) = (A 3 (D 1 ), ……. A m (D 1 ), W(D 2 ) = (A 3 (D 2 ), ……. A m (D 2 ).
Our criterion is a natural generalization of the criterion Deng and Tang 10 applied to two-level nonregular factorials. Let D be an orthogonal factorial design n × m, with n = 4t. The frequency of r column combinations is f rj , which gives j r = 4(t + 1 − j) for j = 1, ……,t,t + 1. We have that P t + 1 j = 1 f rj = m r � � and as f 1j = f 2j = f 3j = 0 for orthogonal screening designs, this helps in reducing the number of the f rj we need to calculate for r ≥ 4. The notation we use to define the CFV of D is F = ½ð f 41 ,…::…, f 4t Þ; ð f 51 ………, f 5t Þ; …::; ðf m1 , ……, f mt Þ�: Let f i (D 1 ) and f i (D 2 ) be the i th entries of the CFV of two designs D 1 and D 2 , i = 1,. …,(m − 2)t, and let i be the smallest integer such that f i (D 1 ) 6 ¼ f i (D 2 ). If f i (D 1 ) < f i (D 2 ), then D 1 has a less generalized aberration and hence is preferred.
The results are presented in Tables 6, 9, and 10. In these tables, we have six columns where n is the number of runs, Design is the type of design, J r are the J characteristics calculated from Equation (1) using any subset T of r distinct columns of D, Freq is the number of different subsets of r distinct columns of D that give the same J r , and CFV is the confounding frequency vectors. You can generate the CFV by looking at J r and Freq.

| RESULTS
Generalized resolution for each design is presented in Tables 7 and 8. We observe that the designs with the same number of runs have the same resolution for all cases with runs up to 33. This is expected because the three designs have the same maxJ r . Thus, we need to distinguish designs by using CFV criterion, as shown in Tables 9 and 10. Note that it is not always possible to test all three designs because their existence is subject to the parameters. For example, when n = 29 and 45, some designs cannot exist. In this situation, no comparison is needed because there is only one design. Moreover, the generalized resolution criterion is able to rank designs in only few cases. For example, the DSD has better generalized resolution (R(D) = 4.67) than the WSD-2 when n = 37. When n = 41, we can see that the WSD-3 has slightly better generalized resolution than the others.

| PROJECTION ESTIMATION CAPACITY
The generalized resolution and confounding frequency vector approaches are standard ways to compare two-level screening designs, and the extension to three-level designs is useful. However, it should be noted that none of these diagnostics can tell the practitioner what models are identifiable. Loeppky et al. 29 introduced projection estimation capacity (PEC) criterion that is closely related to the estimation capacity (EC) criterion initiated by Sun. 30 The aim of both criteria is to count the maximum number of estimable models of a specific design. The model space in EC criterion includes models with all k-main effects and some selected two-factors interactions. In PEC criterion, the model space includes all models that have some of the k-main effects and all their two-level interactions. To investigate the estimable number of models, we define D to be an n × m design matrix, we need to investigate all possible models that include k-main effects and all the 2-fi of these k effects. To clarify, whereρ k (D) is the number of estimable models that contain k main effects with their associated two-level interactions. (p 1 , p 2 ,.…, p m ) is known as the PEC sequence of D. Loeppky et al. 29 listed some key features for PEC criterion that we summarize in the following points: (1) PEC criterion can be applied to both regular and nonregular, designs.
(2) PEC provides sufficient information about main effects and their two-factor interactions prior to the experiment.
(3) PEC can be used to compare designs with different number of factors and run sizes.
From the definition of PEC, it is easy to see that a design is desirable to have a large P k (D). Thus, a design with the highest values in the PEC sequence is preferred because it has the maximum projection estimation capacity (MPEC). Any model is not estimable if it has more parameters than degrees of freedom (trivial case). Cheng et al. 28 observed that under the EC criterion, the minimum aberration criterion can perform very well and that provides a statistical justification for MA in terms of the model robustness.
The projection estimation capacity (PEC) makes a step towards the evaluation and comparison of the designs based on the number of models they can identify. However, as it was noted by a reviewer, it is rare that all the interactions of a subset of factors are active. So, the PEC is perhaps a bit conservative.
In the next example, we evaluate the designs by using the PEC criterion.

Example 4
Suppose an experimenter want to investigate a range of effects for six factors. He applied DSD and WSD-2 to design the experiment. The aim was to estimate the main effects and their two-level interactions. As shown in Table 11, all models of the DSD and the WSD-2 are estimable when k = 1 and k = 2 factors. Thus, both designs can provide the same information if k = 1 or 2. However, DSD is preferable to WSD-2, in terms of PEC, because all three main effects and their two-levels interactions (k = 3) are estimable whether WSD-2 does not allow the estimation of all models that include three factors and all their corresponded two-factors interactions. Thus, DSD is the MPEC. For eight factors, we can construct all the three designs under consideration (DSD, WSD-2, WSD-3). The results of this comparison are shown in Table 12. For k = 1, k = 2 and k = 3 factors, we have the same PEC. However, if k > 3, then DSD is the MPEC based on the PEC sequence. In addition, we observe that WSD-3 is much better than WSD-2 with respect to the PEC sequence. Tables 13 and 14 show that DSD is preferred designs, but WSD-3 gives more information than WSD-2 when m = 12. It is obvious that there is a positive relationship between design size and PEC sequence because the variation between designs on PEC sequence becomes small when m is increasing (for m = 6, 8, 10, 12, 16, and 20, see Tables 11 to 16, respectively).