In integrative analysis, the model and regression coefficients have two dimensions. The first is the gene dimension as in many other studies. The second is the study dimension which is unique to integrative analysis. To accommodate the two dimensions, composite penalties are needed for marker selection. We adopt the MCP for outer penalty and different inner penalties under the homogeneity and heterogeneity models.

#### MCP

The MCP is proposed in Zhang (2010). It belongs to the family of quadratic spline penalties. In single-data set analysis, it has been shown to have satisfactory variable selection properties. The penalty is defined as

- (1)

where *λ* is a penalty parameter, *γ* is a regularization parameter that controls the concavity of *ρ* and *x*_{+}=*x*I(*x*≥0). The MCP can be easily understood by considering its derivative, which is

where sgn(*t*)=−1,0, or 1 if *t*<0,=0, or >0, respectively. As |*t*| increases from zero, MCP begins by applying the same rate of penalization as Lasso, but continuously relaxes penalization until |*t*|>*γ**λ*, a condition under which the rate of penalization drops to zero. It provides a continuum of penalties where the Lasso penalty corresponds to *γ*=∞ and the hard-thresholding penalty corresponds to *γ*1+. Compared with other penalties that also enjoy selection consistency, MCP may be preferred because of its computational simplicity (Mazumder *et al.*, 2011). The MCP approach has been developed for single-data set analysis. With multiple data sets, we consider the following MCP-based composite penalties.

#### Homogeneity model

Under the homogeneity model, consider the estimate

- (2)

Here *ρ*(·;*λ*,*γ*) is defined in expression (1). *M*_{j} is the size of group *j*. When the *M* studies have matched gene sets, *M*_{j}≡*M*. is the ℓ_{2} norm of *β*_{j}, which is the square root of a ridge penalty, with the convention that if gene *j* is not measured in study *m*. Because of its specific form, the penalty defined above is also referred to as 2-norm group MCP, or 2-norm gMCP hereafter (Huang *et al.*, 2012a,b; Ma *et al.*, 2011b).

Formulation (2) has been motivated by the following considerations. In our study, genes are the basic functional units. Thus, the overall penalty is the sum of *d* individual penalties, with one for each gene. For gene selection, we propose using MCP. For a specific gene, its effects in the *M* studies are represented by a ‘group’ of *M* regression coefficients. Under the homogeneity model, all the *M* studies should identify the same set of genes. Thus, within a group, the ridge penalty is adopted, which encourages shrinkage but does not conduct selection. *M*_{j} is introduced to more easily accommodate partially matched gene sets.

#### Heterogeneity model

Under the heterogeneity model, we first consider the estimate

- (3)

Here is the Lasso penalty (ℓ_{1} norm of *β*_{j}). Because of its specific form, the penalty defined in (3) is referred to as 1-norm gMCP (Huang *et al.*, 2012a).

Under the heterogeneity model, gene selection is still needed, which is achieved using the MCP outer penalty. In addition, for a selected gene, it is necessary to identify the studies in which it is associated with responses. Thus, the second level of selection is needed, which is accomplished with the Lasso penalty in (3). This strategy shares a similar spirit with the group bridge approach in Huang *et al.* (2009). The difference is that in Huang *et al.* (2009), there is only one data set, and a group is composed of multiple genes. In contrast in this study, there are multiple data sets, and a group corresponds to only one gene.

The Lasso penalty is adopted in formulation (3) because of its computational simplicity. In single-data set analysis, it has been shown that MCP has better selection properties than Lasso (Zhang & Huang, 2008; Zhang, 2010). Motivated by such a result, we consider

- (4)

We refer to the penalty defined in the above formulation as the composite MCP. Breheney & Huang (2009) suggest that although *a* and *b* can be chosen separately, it is sensible to set them connected in a manner to ensure that the group level penalty attains its maximum if and only if all of its components are at the maximum.

#### Computation

Existing algorithms are not directly applicable to solve the minimizations in (2), (3) and (4). Below we describe computational algorithms for (2) and (4). Formulation (3) can be solved in a similar manner. We first consider a linear regression problem with *E*(*Y*|*X*)=*X**β*, which has a least squares objective function. Here *Y*,*X* and *β* have similar definitions as in section 2. The logistic model can then be transformed into a sequence of least squares problems.

*Least squares with 2-norm gMCP.* Consider the homogeneity model, where the estimate is defined as

- (5)

We adopt a coordinate descent approach (Friedman *et al.*, 2010), which minimizes the objective function with respect to one group of coefficients at a time and cycles through all groups. It transforms a complicated minimization problem to a series of simple ones. With fixed tuning parameters, the coordinate descent algorithm proceeds as follows:

This algorithm starts with a null model. In each iteration, it cycles through all *d* genes. For each gene, as (7) only involves simple computations, the update can be accomplished easily. There are multiple choices for the convergence criterion. In our numerical study, we use the ℓ_{2} norm of the difference between two consecutive estimates smaller than 0.01 as the convergence criterion, which has reasonable performance. In practice, other convergence criteria can be adopted, depending on data characteristics. In objective function (5), the first term is continuously differentiable and regular in the sense of Tseng (2001). The second term, the penalty, is separable. Thus, the coordinate descent algorithm converges to a coordinatewise minimum of the first term, which is also a stationary point (Tseng, 2001).

*Least squares with composite MCP*. Even with the simple least squares objective function, composite MCP does not have a convenient form for updating individual groups. We adopt a local coordinate descent approach (LCD; Breheny & Huang, 2011) to compute

- (8)

With fixed tuning parameters, our computational algorithm consists of a sequence of nested loops:

*Outer loop*:Update the majorized quadratic function

using the current estimate

.

*Inner loop*:Run the algorithm developed for the penalized least squares problem with the objective function

.

When the true models are identifiable, under mild regularity conditions, the overall decreasing trend of the objective function and hence convergence of this algorithm can be derived from the convergence of the coordinate descent algorithm following Vaida (2005).

#### Tuning parameter selection

The MCP (1) involves two tuning parameters *λ* and *γ*. The effect of *λ* is similar to that with other penalization approaches, with larger values leading to sparser estimates. Generally speaking, smaller values of *γ* are better at retaining the unbiasedness of MCP for large coefficients. However, they also have the risk of generating objective functions that have a non-convex region, which may introduce difficulty to optimization and yield solutions that are discontinuous with respect to *λ*. Loosely speaking, it is advisable to choose a *γ* value that is ‘big enough’ to avoid this problem, but ‘not too big’. Following published studies, we have experimented with a few values for *γ*, particularly including 1.8, 3, 6 and 10. In our simulation and data analysis, *γ*=6 leads to the best performance. We search for optimal *λ* values using V-fold cross validation (V=5 in numerical study; Hastie *et al.*, 2009). As shown in Breheney & Huang (2011, Fig. 2), when *λ* is too small, the cross validation criterion may not be locally convex. In such a region, the criterion may not be reliable, and the estimates are discontinuous and noisy. To avoid such a problem, we select *λ* where the criterion first goes up.