FDR Controlled Multiple Testing for Union Null Hypotheses: A Knockoff-based Approach

False discovery rate (FDR) controlling procedures provide important statistical guarantees for the replicability in signal identification based on multiple hypotheses testing. In many fields of study, FDR controlling procedures are used in high-dimensional (HD) analyses to discover features that are truly associated with the outcome. In some recent applications, data on the same set of candidate features are independently collected in multiple different studies. For example, gene expression data are collected at different facilities and with different cohorts, to identify the genetic biomarkers of multiple types of cancers. These studies provide us opportunities to identify signals by considering information from different sources (with potential heterogeneity) jointly. This paper is about how to provide FDR control guarantees for the tests of union null hypotheses of conditional independence. We present a knockoff-based variable selection method (\textit{Simultaneous knockoffs}) to identify mutual signals from multiple independent data sets, providing exact FDR control guarantees under finite sample settings. This method can work with very general model settings and test statistics. We demonstrate the performance of this method with extensive numerical studies and two real data examples.


Introduction
There is a pressing need in making discoveries by analyzing information from multiple sources jointly. With recent advances in scientific research, data on the same set of candidate features are often collected independently from multiple sources. For example, social scientists collect data on the economic and socioeconomic status of people from different community groups. In genome-wide association studies (GWAS), associations of genome features with multiple different outcomes of interest are studied in multiple experiments (Uffelmann et al., 2021). These data motivate us to identify mutual signals from multiple experiments for purposes like reproducibility research or mediator identification. This paper focuses on how to identify mutual signals from multiple independent studies and provide variable selection accuracy guarantees with mild design and model assumptions.
Now we formulate the mutual signal identification problem in statistical terms. Assume we have data from K independent experiments and denote [K] = {1, · · · , K}. Within the k-th experiment, (Y k i , X k i1 , · · · , X k ip ) iid ∼ D k , i = 1, · · · , n k . In our setting, the outcome variables Y 1 , · · · , Y K can be of different data types and (X k 1 , · · · , X k p ) can have different distributions among the different experiments. Here we denote Y k s and X k s as continuous variables throughout the paper for the simplicity of notation. In practice, they can be of other data types (continuous/count/nominal/ordinal/mixed). For example, Y k s can be different disease outcomes and X k s can be gene expression data measured on different scales. Define H k 0j as the null hypothesis indicating the j-th feature not being a signal in the k-th experiment (i.e. X k j ⊥ ⊥ Y k |X k −j where X k −j := {X k 1 , · · · , X k p } \ X k j ), and denote H k = {j ∈ [p] : H k 0j is true}, where [p] := {1, · · · , p}. Instead of testing the H k 0j s, we are interested in testing the union null hypotheses We define S = {j ∈ [p] : H 0j is false} and H = S c = ∪ K k=1 H k = {j ∈ [p] : H 0j is true}. (2) We aim at developing a selection procedure returning a selection set S ⊆ [p] with a controlled false discovery rate (FDR), which is the expected false discovery proportion (FDP): To begin, we give some examples to motivate our method.

Examples
The problem of testing multiple union null hypotheses is related to many important scientific areas, for example, the reproducibility analysis in GWAS (Bogomolov and Heller, 2013;Heller et al., 2014;Heller and Yekutieli, 2014), comparative research in genomics studies (Rittschof et al., 2014), and mediation analysis (Sampson et al., 2018). Below, we give several motivating examples that can be considered as problems of identifying the mutual signal set S .

Repeatability research
In some fields of biology, experimental results are required to agree with each other under conditions that include the same measurement procedure, same operators, same measuring system, same operating conditions, same location, and replicate measurements on the same or similar objects (Plant and Hanisch, 2020;Ioannidis et al., 2009). It aims at identifying signals in repeated experiments. Mathematically, K independent data sets (Y 1 , X 1 ), · · · , (Y K , X K ) are collected, where Y k ∈ R n k and X k ∈ R n k ×p for k ∈ [K], and (Y k i , X k i1 , · · · , X k ip ) iid ∼ D for all k ∈ [K] and i ∈ [n k ]. To identify the j-th feature as a signal, we test the union null hypothesis H 0j as defined in equation (1).
As a remark, for repeatability research, when we assume (Y k i , X k i1 , · · · , X k ip ) are i.i.d. for all k ∈ [K] and i ∈ n k , the union null hypotheses set H is identical to the null hypotheses sets from the individual experiments for all k ∈ [K]. In this case, we can alternatively pool the data and do analyses on the pooled data set to improve power (NCI-NHGRI, 2007). However, in many cases, this is not practical because of privacy and data ownership issues. experiments could be slightly different because they are conducted in different institutions, by different experimenters, or at different times. For example, in genetic studies, the association between single nucleotide polymorphisms (SNPs) and phenotype is recognized as a scientific finding only if it has been discovered from different independent studies with the same features and different cohorts (Heller et al., 2014).
To form the conditional independence tests for this case, we have K independent studies, where (Y k i , X k i1 , · · · , X k ip ) iid ∼ D k , for i ∈ [n k ], for each k ∈ [K], and we test the conditional independence Y k ⊥ ⊥ X k j |X k −j , for k ∈ [K] and j ∈ [p]. The j-th feature is a mutual signal if and only if the union null hypothesis H 0j does not hold.
As a remark, Heller et al. (2014) proposed a repFDR method, which also provides the FDR control guarantees on testing multiple union null hypotheses. This method is based on the Benjamini-Hochberg (BHq) procedure (Benjamini and Hochberg, 1995); and it assumes that the vector of test statistics for hypotheses in each study are jointly independent or are positive regression dependent (PRDS) on the subset of true null hypotheses. This assumption does not hold in general in our settings. There is a modification of this method that allows for an arbitrary dependence, however, it is known to be very conservative (Benjamini and Yekutieli, 2001).

High-dimensional mediator selection
In many scientific fields, it is important to identify features that are associated with multiple responses. In particular, mediators can be discovered from simultaneous feature-treatment and feature-outcome associations. For example, suppose we aim at identifying gene expression mediators that are both associated with the treatment and the risk of a certain disease. To do this, we jointly use information from two independent studies, one on the associations between the gene expressions and the treatment, the other on the association between the gene expressions and the risk of the disease with the treatment being fixed. The selection of mediators from high-dimensional gene expression features can be framed as a problem of testing the union null hypotheses with K = 2. In particular, (Y k i , X k i1 , · · · , X k ip ) iid ∼ D k , for k = 1, 2, where Y 1 and Y 2 are the treatment and the outcome respectively. Notice that in this example the true signal sets for Y 1 and Y 2 are not necessarily identical. We test the conditional independence Y k ⊥ ⊥ X k j |X k −j , for k = 1, 2, and j ∈ [p]. The j-th feature is a mediator if and only if the union null hypothesis H 0j does not hold.

Prior work
Current advance in FDR control for identifying simultaneous signals For reproducibility research, Bogomolov et al. proposed methods based on the BH procedure by selecting features that are commonly selected among all the experiments (Heller et al., 2014;Heller, 2013, 2018). There are multiple works based on computing the local FDR as the optimal scalar summary of the multivariate test statistics (Chi, 2008;Heller and Yekutieli, 2014). Recently, Xiang et al. (2019) presented the signal classification problem for multiple sequences of multiple tests, where the identification of the simultaneous signal is a special case, and Zhao and Nguyen (2020) proposed a nonparametric method for asymptotic FDR control in identifying simultaneous signals. However, all methods above assume not only the independence of the experiments but also the independence (or PRDS) of the p-values for the features within each experiment, which is not realistic in many complex high-dimensional data applications, such as the GWAS and other omics data.
Knockoff-based methods For multiple testing problems within a single experiment, there are recent advances in relaxing the assumption of independence among the features. Powerful knockoff-based methods have been developed for exact FDR control in selecting features with conditional associations with the response (Barber and Candès, 2015;Candès et al., 2018). The original knockoff filter proposed by Barber and Candès (2015) works on linear models assuming no knowledge of the design of covariates, the signal amplitude, or the noise level. It achieves exact FDR control under finite sample settings. It is also extended to work with high-dimensional settings (Barber and Candès, 2019). Later Candès et al. (2018) proposed the Model-X knockoff method, extending the knockoff filter to achieve exact FDR control for nonlinear models. This method allows the conditional distribution of the response to be arbitrary and completely unknown but requires the distribution of X to be known. Barber et al. (2020) further showed that the Model-X knockoff method is robust against errors in the estimation of the distribution of X. In addition, Huang and Janson (2020) relaxed the assumptions of the Model-X knockoff method so that the FDR can be controlled as long as the distribution of X is known up to a parametric model. There are also abundant publications on the construction of knockoffs with an approximated distribution of X. Romano et al. (2020) developed a Deep knockoff machine using deep generative models. Liu and Zheng (2019) developed a Model-X generating method using deep latent variable models. More recently, Bates et al. (2020) proposed an efficient general metropolized knockoff sampler. Spector and Janson (2020) proposed to construct knockoffs by minimizing the reconstructability of the features. Knockoff-based methods have also been extended to test the intersection of null hypotheses. In this direction, group and multitask knockoff methods (Dai and Barber, 2016), and prototype group knockoff methods (Chen et al., 2019)

Our contributions
In this paper, we propose a knockoff-based procedure to establish exact FDR control in selecting mutual signals from multiple conditional independence tests, assuming very general conditional models. The main contributions of this paper are summarized below: 1. We construct a knockoff-based procedure for testing the union null hypotheses for feature selection, namely the Simultaneous knockoffs. This procedure can work on general conditional dependence models Y |X and data structures in X.
2. We prove that the Simultaneous knockoff method can lead to exact FDR control in testing multiple union null hypotheses for feature selection under finite sample settings.
3. We show that a broad class of filter statistics can be used for this method, and give general recipes for generating different powerful statistics.
4. We demonstrate the FDR control property and the power of our method with extensive simulation settings. We also illustrate the application with two real data examples.
The rest of the paper is organized as follows. In Section 2, we present the Simultaneous knockoff framework. In Section 3, we give the theoretical guarantees for exact FDR control of the Simultaneous knockoff method in finite sample settings and the robustness result for the potential misspecification of the distribution of X. In Section 4, we show the empirical performance of the Simultaneous knockoff method under different model assumptions and data structures. Finally, in Section 5, we apply the Simultaneous knockoff procedure to two real data examples.

Methods
In this section, we present the Simultaneous knockoff procedure. This procedure can be paired with both the Fixed-X knockoffs (Barber and Candès, 2015) and the Model-X knockoffs (Candès et al., 2018) to allow for very general model settings and various data structures in real data applications. Before presenting the Simultaneous knockoff method, we briefly review the Fixed-X and the Model-X knockoff methods.

The Fixed-X and the Model-X knockoff procedures
The high-level idea behind the knockoff methods is to construct a "knockoff" copy of the covariates, retaining their inner structures. Unlike the "true" covariates, the knockoff copies are created independent of the response. These knockoff variables are then mixed into the model to monitor the FDP during the selection. Heuristically speaking, if one variable is a true signal, it is more likely to be selected than its knockoff copy, otherwise, it is equally likely to be selected as its knockoff copy. Therefore, by counting the number of knockoff variables entering the selected set, the FDP can be (over) estimated.
Model settings For the Fixed-X knockoff method, the setup is a decentralized linear model, . This method has weak assumptions on the covariates X, the amplitudes of the unknown regression coefficients β, and does not require the noise level (σ 2 ) to be known. The Model-X knockoff method works on more general conditional model settings. It does not require the dependence of Y |X to be known by assuming the knowledge of the distribution of X (or if the distribution of X can be well approximated). Therefore, it can work with many more models such as generalized linear models (GLMs) or nonlinear models.
Algorithm There are four main steps in the knockoff procedure listed as below.
• Knockoff construction. A set of knockoff features X = [ X 1 . . . X p ] are constructed in this step. For the Fixed-X knockoff construction, X needs to satisfy that, for some vector s ≥ 0, For the Model-X knockoff construction, X needs to satisfy the pairwise exchangeability condition: where Swap(j) stands for exchanging the j-th column and the (j + p)-th column of [X X], and A d = B indicates A and B are identical in distribution. X can be generated using various algorithms (Barber and Candès, 2015;Romano et al., 2020;Liu and Zheng, 2019;Bates et al., 2020;Spector and Janson, 2020 • Filter statistics calculation. We construct the filter statistics W ∈ R p such that W j = f (Z j , Z j ) where f is an antisymmetric function, i.e. f (x, y) = −f (y, x). Without loss of generality, we further let f (x, y) > 0 when x > y. If X j is a signal, we would expect P Z j > Z j > 0.5, while if X j is not a signal, we would expect Z j and Z j to have the same distribution. Thus, we expect W j to have a positive sign with > 0.5 probability if X j is a signal and with 0.5 probability if X j is not a signal. This allows us to estimate the FDP in S(t) := {j : W j ≥ t} as • Threshold calculation and feature selection. With the knockoff filter, we select With a more conservative knockoff+ filter, we select Here q is the target FDR level and W + = {|W j | : |W j | > 0}.

Simultaneous knockoff framework
In this section, we propose the general Simultaneous knockoff framework, which enables us to use the knockoff approach for FDR control in testing the union null hypotheses of conditional independence. This approach enjoys very general model assumptions and exact FDR control guarantees in finite sample settings.

Preliminaries
One naïve idea to identify mutual signals in K experiments is to select the intersection set of the variables selected from the individual experiments. However, this method cannot control the FDR (see more details in Section 4). Therefore, we alternatively aim at constructing valid filter statistics W to allow the estimation of the FDP in our multiple testing of the union null hypotheses using equation (6). We establish a general recipe to construct such Ws with only the summary statistics that can be calculated using the Fixed-X and the Model-X knockoff methods from single experiments. To begin, we give several definitions.

Definition 1. (Swapping) For a set S ⊆ [p], and for a vector
As a remark, the definition of an OSCF implicitly requires that for any set S ⊂ [p], An example of the OSCF can be defined as below: Let a = (a 1 , · · · , a K ) ∈ A where A = {0, 1} K . We separate A to two sets: the even set A e = {a : mod(||a|| 1 , 2) = mod(K, 2)}; and the odd set A o = {a : mod(||a|| 1 , 2) = mod(K + 1, 2)}. Then we obtain an OSCF function where Z jk and Z jk are the j-th entry of Z k and Z k respectively. In particular, when K = 2, this construction can be written as: More OSCF examples are given in Web Appendix A.3.
There are multiple ways to construct OSFFs. As shown in Lemma A1 in Web Appendix A.5, if f 1 : R 2pK → R 2p is an OSCF and f 2 : R 2p → R p is a flip-sign function, then f = f 2 • f 1 is an OSFF, where • denotes the composition of functions. When using the OSCF as defined in (10) and using the flip-sign function as defined in (9), we obtain an OSFF . Alternative ways to construct OSFFs and more examples are provided in Web Appendix A.5.

Algorithm
The Simultaneous knockoff procedure is described below: • Step 1: Knockoff construction for the individual experiments. Denote the knockoff matrices for X 1 , · · · , X K as X 1 , · · · , X K . For each k ∈ [K], select a knockoff construction method (either Fixed or Model-X) as described in Web Appendix A.1 that is compatible with the model setting for the experiment k to generate X k .
. Examples of OSFFs can be found in Web Appendix A.5. • Step 4: Threshold calculation and feature selection. Using the filter statistics W from Step 3, we apply the knockoff+ filter (8) to obtain the selection set S + under the Simultaneous knockoff + procedure; or apply the knockoff filter (7) to obtain S under the Simultaneous knockoff procedure.

Theoretical Results
The main result for this paper is the theoretical guarantee that Simultaneous knockoff and Simultaneous knockoff + procedures can control the modified FDR (as defined in (11) in Theorem 1) and FDR respectively.
Theorem 1. With the individual experiments satisfying the Fixed-X or the Model-X knockoff model settings, the Simultaneous knockoff procedure (7) controls the modified FDR defined as and the Simultaneous knockoff+ procedure (8) controls the usual FDR where H is the union null set as defined in (2).
The Fixed-X and the Model-X knockoff model settings can be found in Web Appendix A.1. The definition of mFDR is close to the FDR, especially when the selection set is relatively large. Although the more conservative Simultaneous knockoff + procedure can achieve exact FDR control, in real data applications, the knockoff filter is more widely used (Barber and Candès, 2015;Dai and Barber, 2016;Candès et al., 2018;Sesia et al., 2018;Romano et al., 2020).
The key step for the proof of Theorem 1 is to show that the signs of the W j s for the union nulls are i.i.d. following a Bernoulli( 1 2 ) distribution, and independent of |W j | for all j ∈ H . As for the knockoff-based methods, this property effectively guarantees that for all j ∈ H , there are equal probabilities of selecting the feature and its knockoff copy, which allows the knockoff procedure to (over) estimate the FDP. We show this in the Lemma 1. The details of the proof can be found in Web Appendix B.
For models beyond the linear models, we need to use the Model-X knockoffs in the individual experiments. In real applications, the distribution of the candidate features might not be known exactly. In Candès et al. (2018), the robustness against the misspecification of the X distribution is shown empirically. Barber et al. (2020) and Huang and Janson (2020) further addressed this question theoretically. For the Simultaneous knockoff procedure, it is also very important to establish the robustness results against the misspecification of the distribution of X. The following theorem shows the result.
Theorem 2. Under the definitions in Section 2, for any ≥ 0, consider the null variables for which where P denotes the true distribution and Q denotes the misspecified distribution. If we use the knockoff+ filter, then the fraction of the rejections corresponding to such nulls obeys E   |{j : j ∈ S ∩ H and min k: In particular, this implies that the FDR is bounded as Similarly, if we use the knockoff filter, for any ≥ 0, a slightly modified fraction of the rejections corresponding to nulls with min k: and from this, we obtain a bound on the modified FDR: In real applications, when we have additional samples of X (for estimating the distribution of X), we will be able to achieve a small enough . Otherwise, it has been proposed to evaluate the potential inflation of FDR using simulation (Romano et al., 2020). In Theorem 2, we show an FDR upper bound result for Simultaneous knockoffs which is similar to the result in Barber et al. (2020) for Model-X knockoffs, to build some statistical foundations for such simulation approach. Below we give an example to demonstrate its application.
Consider the example of the Gaussian knockoffs in Barber et al. (2020), i.e., X k is normally distributed with the mean zero and the variance-covariance matrix (Θ k ) −1 and we use the Gaussian knockoff construction method, i.e., sample is positive definite, then as shown in Barber et al. (2020), we have with probability at least 1 − p −1 , and Rem is a vanishing term when n −1 k log(p) → 0. The graphical Lasso estimator of Θ k with (unlabeled) sample size N k satisfy that ). So 4δ Θ k n k log(p) will be small if the unlabeled sample size N k for each subsample is large enough in the sense that N k >> n k s Θ k [log(p)] 2 . Under a special setting where there exists a subset Ω ⊂ [K] such that H = ∪ k∈Ω H k , then we will just need enough unlabeled sample size within those subsample with index from Ω. Our theoretical guarantees focus on the control of FDR. The power is a monotonically decreasing function of K and a monotonically increasing function of n. Asymptotically, as K is fixed, and log p n → 0, the power converges to 1 as n → ∞ (See details in Web Appendix C.3.5).
Since there are no theoretical results on the choice of W for the most powerful test, we compare the power with several choices of Ws numerically. To understand the power of the proposed statistics, we plot the empirical distributions of the filter statistics (W j ) for j ∈ H and j ∈ S assuming Z j s for j ∈ H are i.i.d. normally distributed (Figure 1). We can see that for j ∈ H , the filter statistics W j is symmetric around 0, whereas for j ∈ S , P {W j > 0} > 1/2.

Simulation
Simulation settings We first consider the K = 2 case with three data settings: We compare our proposed method (simultaneous) with the two alternative methods below: • Pooling. Data are pooled together and tests of the conditional associations are performed using the knockoff methods for a single experiment.
• Intersection. Knockoff methods for single experiments are used for the individual experiments and the intersection of the selected sets is used to construct the selection set of the mutual signals.
We first study the effect of the signal sparsity level of the mutual signals and the non-mutual signals. We use s 0 to denote the number of simultaneous signals among the K experiments, and s k to denote the number of the signals that are only present in the k-th experiment. We study the three cases: 1. s 1 = s 2 = 0; 2. s 1 = 0, s 2 = 0; 3. s 1 = s 2 = 0. Next, we study the effect of the correlations among the covariates. Third, we study the effect of the difference in signal strengths between the two experiments. We consider two scenarios for the signal strengths: Scenario 1. the directions and the strengths of the mutual signals are identical among the K experiments; Scenario 2. only the directions of the mutual signals are the same but the signal strengths are independent among the K experiments. Data generation and algorithm implementation details can be found in Web Appendix C. Additional simulations for the K = 3 case, the power comparison among different choices of the filter statistics W, and the empirical distributions of W to show why the method has power are also provided in Web Appendix C.
Results Figure 2 shows the power and the FDR for the three methods (simultaneous, pooling, and intersection) on the three data settings (continuous, binary, and mixed) when we vary s 1 = s 2 . As s 1 = s 2 increases, only the simultaneous method controls the FDR. The simultaneous method has slightly lower power than the pooling method, and the power gap is still moderate when the signals in the two experiments have different strengths (Scenario 2, right panel). The simulation results are in agreement with our theoretical expectations. First, in terms of FDR control, the  Figure 1: Distributions of the filter statistics W j = Z j − Z j , where Z j and Z j are as defined in (10) with K = 3 for the cases where feature j is not a signal in any of the experiments (null), j is a signal in one experiment (null), two experiments (null) and three experiments (alternative).
simultaneous method we proposed always controls the FDR across all our designed settings. The pooling method only controls the FDR when all samples from the two experiments are i.i.d. The intersection method controls the FDR when s 2 = 0 but it fails when s 1 = s 2 = 0. In terms of power, there is some gap between the simultaneous and the pooling methods, because the tests of union null hypotheses are more stringent. However, the gap is moderate. More detailed simulation results can be found in the Web Appendix C. The simulation results for the K = 3 case are similar to the K = 2 case and are consistent with our theoretical expectations (see Figure C6 in Web Appendix C). The simultaneous method controls the FDR and has good power. The pooling method has high power but also has a very high FDR rate when there are signals that are shown in either one or two of the samples only. The intersection method has similar power to the simultaneous method, but it cannot control the FDR when a large number of features are signals in only two of the three samples.
The comparison among different W statistics suggests that the Max and Diff (see definitions in Web Appendix C.1.3) Ws have the best performance among the Ws we have explored. More simulation results can be found in Web Appendix C.

Real data analysis
In this section, we demonstrate the application of the Simultaneous knockoff method on two real data examples. For the first data example, we use the Fixed-X knockoffs with linear models for the individual experiments; and for the second data example, we use the Model-X knockoffs with a penalized Cox regression model for each gene expression experiment.

Application to the Communities and Crime data
In a crime rate study, we aim to identify features that are universally associated with the community crime rate, regardless of race distribution in the community. This is potentially useful in guiding unbiased policy-making based on race-blinded findings. To achieve that, we select features that are simultaneously associated with the crime rate in different race distribution groups.
We use the publicly available Communities and Crime data set from the University of California Irvine (UCI) machine learning repository. The data set contains crime information on n = 1994 communities with different race distributions in the US. For the individual communities, it has information on the crime rate, as well as 122 other variables that are potentially related to the crime rate. All continuous variables are normalized to the 0-1 range. Our primary outcome of interest is the normalized crime rate, our feature candidates are the p = 95 features with no missing values that are not directly defined by race. We split the data into two subsets with approximately equal sample sizes based on the proportion of the Caucasian population (high/low) within the community. We fit a linear regression model to each subset of the data, aiming to identify mutual signals from both models with FDR control. We compare the three variable selection procedures (simultaneous, pooling, and intersection) using the knockoff filter (7). We also compare our method with the repFDR (Heller et al., 2014). The repFDR is developed for replicability studies, which requires that under the null the Z-scores are normally distributed. More details can be found in Web Appendix D.1. Table 1 shows the results of identified features from different algorithms with a targeted FDR level of q = 0.1. Our proposed simultaneous method selected the following variables: "the percentage of households with public assistance income in 1989", "the percentage of kids born to never married", "the percent of persons in dense housing". The pooling method selected "the percentage of kids born to never married", "the percent of persons in dense housing", and "the number of  vacant households". The intersection method selected "the percentage of kids born to never married", and "the percent of persons in dense housing". The repFDR method selected "the percentage of males who have never married".
To verify the robustness of our proposed method, we also added a set of 95 fake features by permutation. The feature selection results are shown in Table 1 (Sensitivity). The variable selections with the simultaneous method is relatively stable.

Application to the TCGA data
In this section, we demonstrate the usage of the Simultaneous knockoffs to identify gene expressions that are associated with glioblastoma multiforme (GBM) for both male and female sub-populations. GBM is known as a hallmark of the malignant process, however, the molecular mechanisms that dictate the locally invasive progression remain an active research area. In this example, we use male and female sub-populations to demonstrate variable selection using our proposed simultaneous method to identify mutual signals from heterogeneous data sets. In real applications, the subpopulations can be much more complicated (i.e. from different sources, collected at different places and times, and with different technologies). Therefore the fact that the simultaneous method does not require the data to be pooled makes it useful.
Our GBM gene expression data are from The Cancer Genome Atlas (TCGA). The data contains 501 subjects with the overall survival outcome (in days) and 17813 level-3 gene-level expression data. There are 71 censored and 430 death cases. We use the sure independence screening (SIS, see Fan and Lv (2008)) for a marginal screening, leaving d = n/log(n) = 79 genes with the smallest p-values. The SIS method allows dimension reduction from exponentially growing p to a relatively large scale d < n, while the reduced model still contains all the true signals with high probability. It has been widely used in other studies (Zhang et al., 2021;Luo et al., 2020). We apply the Simultaneous knockoffs to identify genes associated with the survival time within both the male and female GBM patient cohorts. We also compare our method with the methods pooling, intersection and repFDR. We perform sensitivity analysis by relaxing the screening procedure to include all genes with p-values smaller than 0.0002, which leads to 111 candidate genes after the pre-screening step. For missing data, a complete case analysis was performed for the main analysis, while a single imputation was performed in the sensitivity analysis.
All these genes have been frequently studied for their relationships with cancer, including GBM (Kunadis et al., 2021;Goyal et al., 2021;Heiland et al., 2016). The pooling method selected two genes (CROCC and FMR1NB), and the intersection method selected none. The repFDR method selected 2 genes (MAP2K4 and ZNF239). All the three genes selected by Simultaneous knockoff s are also selected when using the threshold p < 0.0002 for pre-screening, although one additional gene FMR1NB is also selected under the relaxed screening scenario. Sensitivity analysis also shows that EID3 and VPS72 are selected when we use single imputation to treat the missing data.

Discussion
The Simultaneous knockoff method is a general framework for testing the union null hypotheses on conditional associations between candidate features and outcomes. It can work with very general conditional models and covariate structures, assuming the K experiments are independent. This method provides opportunities to combine information from the experiments with heterogeneous X structures, different dependencies of Y |X, and different outcomes Y . The FDR control guarantee is exact for finite sample settings. This method has even broader applications beyond our motivating examples. For example, when working with the electronic health record (EHR) data from multiple data centers, some outcome variables and covariates are recorded differently among the centers (for example, for obesity, some centers record the body mass index (BMI) of patients, but others use yes/no); and the demographic distributions are different. The Simultaneous knockoff method can be used to identify mutual signals to confirm the associations. This method also only requires very limited information (only the test statistics) to be shared among the data centers, which benefits data collaboration under privacy protections.
One big limitation of the current method is that in practice it is hard to work with ultra highdimensional data due to the limits of the computer memory for the knockoff construction. We use the SIS pre-screening step in our real data example to circumvent this problem. Although theoretically, the Simultaneous knockoff method does not require the number of variables to be smaller than the number of observations when using the Model-X knockoff construction, the efficient construction of the knockoffs for ultra-high-dimensional features is still challenging and worth further research. Another limitation of the work is the lack of a theoretical analysis of power. This problem is difficult in general and the power of the Model-X knockoff method has just been studied (Wang and Janson, 2021) recently. We expect the power of the Simultaneous knockoff method to decrease monotonically as K and p increase. The exact power change with the growth of n, p, and K is still a challenging open question.
There are some extensions of the knockoff methods to work with group-wise variable selection (Chen et al., 2019;Dai and Barber, 2016) where we are interested in testing whether each specific group of variables is associated with the outcome conditioning on other groups of variables. The current version of Simultaneous knockoff method focuses on the selection of individual features. The extension to work on the selection of groups of features is worth future explorations There are many open questions in multiple testing that are related to the hypothesis testing for the union null hypotheses. Although our Simultaneous knockoff method provides solutions to the reproducibility of studies, feature selections across heterogeneous populations, and mediation analysis, there are still more challenges from the real applications. For example, we can further explore methods that will allow combining the information from different data sets with unidentified overlapping samples (like case-cohort study).
Dai, R. and Barber, R. (2016). The knockoff filter for fdr control in group-sparse and multitask The Fixed-X Knockoff assumptions and construction methods The Fixed-X knockoff method only works for the continuous outcomes that are normally distributed as it requires the decentralized linear model, Y = Xβ + ε, where Y ∈ R n , X ∈ R n×p , ε ∈ R n ∼ N (0, σ 2 I n ), and β ∈ R p . In addition, the Fixed-X knockoff method requires n ≥ 2p. The constructed Fixed-X knockoff features X need to satisfy that, for some vector s ≥ 0, The knockoffs X can be computed using an efficient semidefinite programming (SDP) algorithm with or without randomization (Barber and Candès, 2015). Since X is treated as a fixed design, so this approach allows any type of X, either continuous, categorical, count, or mixed (i.e., different types for different columns of X). For categorical variables, dummy variables will be created and used in X. One thing we would like to point out is that based on our current notation, it will perform a test for each dummy variable separately and compute the FDR by treating different dummy variables as multiple features. However, it will be scientifically more interesting to consider controlling the group FDR as in Dai and Barber (2016) so that the test will not depend on which reference level we choose when creating dummy variables. We use the R function create.fixed within the R package knockoff to implement this construction method.
The Model-X Knockoff assumptions and construction methods The general sampling method as described in Candès et al. (2018) can be applied to all kinds of data types (continuous, categorical, count, or mixed with or without missing data) as long as the joint distribution of X is known or can be estimated. However, for the implementation, their current R package only allows for continuous X, so in simulation, we only consider the Model-X Knockoff construction methods for continuous X. Specifically, two Model-X Knockoff construction methods are reviewed below: • Gaussian: When the distribution of X is assumed to be Gaussian with the variance-covariance matrix Σ, we can sample X from X|X ∼ N (µ, V), where µ and V are given by, We use the R function create.gaussian within the R package knockoff to implement this construction method.
• Second order: The second-order Model-X knockoff construction method tries to sample X such that where the requirements of s are the same as for the Fixed-X knockoff and can be solved using the approximate semidefinite program (ASDP) algorithm as given in (Candès et al., 2018). We use the R function create.second within the R package knockoff to implement this construction method.
The knockoffs X can also be generated using various advanced algorithms (Romano et al., 2020;Liu and Zheng, 2019;Bates et al., 2020;Spector and Janson, 2020). However, we can only find the available implementations of these methods for continuous variables.
Appendix A.2: Statistics compatible with each knockoff construction method The statistics described in this section are for individual hypothesis testing for simplicity of notation. It can be extended to statistics for the group hypothesis testing similar to those used in Dai and Barber (2016 We can run over a range of λ values decreasing from +∞ (a fully sparse model) to 0 (a fully dense model) and define Z j as the maximum λ such that β j (λ) = 0. If there is no λ such that β j (λ) = 0, then we will simply define Z j as 0.

Appendix A.2.2: Statistics compatible with the Model-X knockoff construction method
For the Model-X knockoffs, very general conditional models (such as generalized linear models or nonlinear models) can be used. In addition to the statistics listed in Appendix A.2.1, we can also use the following statistics.
• Absolute coefficient (glmnet): We can use | β j (λ)| from the penalized generalized linear regression or the penalized Cox regression model of Y on [X X] with either a specific λ value or a λ estimated from cross-validation.
• Standardized coefficient (glmnet): We can also use the standardized regression coefficients | β j (λ)|/ SE( β j (λ)) from the penalized generalized linear regression or the penalized Cox regression model of Y on [X X] with either a specific λ value or a λ estimated from cross-validation.
• Order of selection (glmnet): We can use the minimum λ such that the regression coefficient becomes 0 from the penalized generalized linear regression or the penalized Cox regression model of Y on [X X], or the reciprocal of the order of each variable to be included in the model when increasing the number of variables to be selected.
• Variable importance factor: We can use the variable importance factors from the random forest fitting of Y on [X X] with either fixed tuning parameters or tuning parameters selected from cross-validation.

Appendix A.3: Examples of OSCFs
It is obvious that the example given in section 2.2.1 can be generalized to a broader class of OSCFs as below: with arbitrary functions f jk (·), g j (·), h j (·, ·), and a symmetric function ψ j (·) in the sense that Here f jk (·)s can be viewed as the transformations of test statistics for features within individual experiments, allowing us to add weights to the features based on some prior knowledge. g j (·)s depict how information from the different experiments are pooled together, which can be different for different features. h j (·, ·)s are some final transformations of the statistics, which can also be different for different features. Examples of ψ j (·)s include maximum functions such as and sum functions such as w jk (Z jk + Z jk ) with arbitrary weights w jk s.
Notice that this is a non-exhaustive list and there are still OSCFs beyond the class (17).

Appendix A.4: Examples of flip sign functions
Example of antisymmetric functions include difference function f (x, y) = x − y and signed max function f (x, y) = max(x, y)·(−1) 1{x>y} . Actually, it can be the product of any symmetric function g(x, y) = g(y, x) and the sign function, i.e., f (x, y) = g(x, y) · (−1) 1{x>y} . The flip sign function can be constructed entry-wise using the antisymmetric function as introduced in Candès et al. (2018). Examples of flip sign functions constructed this way include the difference function and the signed max function However, there can be other function forms allowing W j to depend on not only Z j and Z j but also A broader class can be constructed by noticing that the Hadamard product of a flip sign function and a (pairwise) symmetric function g(x 1 , · · · , x p , y 1 , · · · , y p ) = g(x 1 , · · · , y k , x k+1 , · · · , x p , y 1 , · · · , x k , y k+1 , · · · , y p ) for any k ∈ [p] is still a flip sign function.

Appendix A.5: Other examples for W construction
We first show that the composite function of a flip sign function and an OSCF can lead to an OSFF.
Proof. We verify this by checking the definition: Besides using the composite function of a flip sign function and an OSCF, we can also create W by combining the flip sign functions within each dataset. This basically allows us to first decide which flip sign function shall be used within each dataset before combining information together. The results are summarized in the following lemma.
are K flip sign functions and g j : R K → R, j ∈ [K] are K combining function such that g j (x 1 , · · · , x K ) = −g j (x 1 , · · · , −x k , · · · , x K ) for all k ∈ [K]. If we define f : R 2Kp → R p constructed elementwise as then f is an OSFF.
Proof. To verify this, notice that for each element, we have which finishes the proof.
Still, there are other ways to construct the OSFF function for calculating the filter statistics W beyond the two classes we discussed above and a few examples are given below: Web Appendix B: Additional proof B.1: Proof of Theorem 1 The proof of Theorem 1 follows the proof idea in Barber and Candès (2015). Let m = #{j : W j = 0} and assume without loss of generality that |W 1 | ≥ |W 2 | ≥ · · · ≥ |W m | > 0. Define p-values p j = 1 if W j < 0 and p j = 1/2 if W j > 0, then Lemma 1 implies that null p-values are i.i.d. with p j ≥ Unif[0, 1] and are independent from nonnulls. We first show the result for the knockoff+ threshold. Define V = #{j ≤ k + : p j ≤ 1/2, j ∈ H } and R = #{j ≤ k + : p j ≤ 1/2} where k + satisfy that |W k + | = τ + where τ + is defined in theorem 1, we have where the first inequality holds by the definition of k + and the second inequality holds by Lemma B4.
Similarly, for the knockoff threshold, we have V = #{j ≤ k 0 : p j ≤ 1/2, j ∈ H } and R = #{j ≤ k 0 : p j ≤ 1/2} where k 0 satisfies that |W k 0 | = τ where τ is defined as in theorem 1, then where the first inequality holds by the definition of k 0 and the second inequality holds by Lemma B4.

B.2: Additional proofs for Lemmas related to Theorem 1
Here we first give the proof of Lemma 1 below: Proof. For any S ⊆ H , we can write it as the union of K subsets S = ∪ K k=1 S k , where S k ⊆ H k for k = 1, · · · , K, and S k1 ∩ S k2 = ∅ for all k 1 = k 2 . In particular, we can let Using the definition of the OSFF, we have So we obtain for any S ⊆ H . Therefore, by choosing S as the set {j : j = −1}, we have (W 1 , · · · , W p ) d = (W 1 · 1 , · · · , W p · p ). and thus we finish the proof of the lemma.
Lemma B4. For k = m, m − 1, · · · , 1, 0, put V + (k) = #{j : 1 ≤ j ≤ k, p j ≤ 1/2, j ∈ H } and V − (k) = #{j : 1 ≤ j ≤ k, p j > 1/2, j ∈ H } with the convention that V ± (0) = 0. Let F k be the filtration defined by knowing all the non-null p-values, as well as V ± (k ) for all k ≥ k. Then the process M (k) = V + (k) 1+V − (k) is a super-martingale running backward in time with respect to F k . For any fixed q, k = k + or k = k 0 as defined in the proof of theorem 1 are stopping times, and as consequences Proof. The filtration F k contains the information of whether k is null and non-null process is known exactly. If k is non-null, then M (k − 1) = M (k) and if k is null, we have Given that nulls are i.i.d., we have .
So when k is null, we have This finish the proof of super-martingale property. k is a stopping time with respect to

B.3: Proof of Theorem 2
The proof of Theorem 2 follows the proof idea in Barber et al. (2020). Define Then for the knockoff+ filter with threshold τ + , we have |{j : j ∈ S ∩ H and min k:j∈H k KL kj ≤ }| The inequality holds by the construction of knockoff+ under the FDR level of q and the fact For the knockoff filter at threshold τ , we have similar results for the modified FDR as below: |{j : j ∈ S ∩ H and min k:j∈H k KL kj ≤ }| Next we will show that E [R (T )] ≤ e for both T = τ and T = τ + to complete the proof.
For events E j = min k:j∈H k KL kj , we have by Lemma B5, So we have The last step is because if W j ≥ −T j for all j, then we have the summation within the expectation is 0, otherwise, the summation is 1 using Lemma 6 of Barber et al. (2020).
This completes the proof of Theorem 2.

B.4: Additional proofs for lemmas related to Theorem 2
Lemma B5. For E j := min k:j∈H k KL kj , we have Proof. For any j ∈ H 0 , find k ∈ H 0j such that KL kj = min k :j∈H k KL k j . Define We will be conditioning on X −(k,j) , X −(k,j) , Y and observing the unordered pair {X k j , X k j } = {X k(0) j , X k(1) j } (we do not know which is which). Without loss of generality, let W j ≥ 0 when This finishes the proof of the lemma.
Web Appendix C: Additional simulation details C.1: Simulation settings C.1.1: Simulation settings for K = 2 Data generation We first describe the data generation procedure to obtain data for a single experiment (Y k , X k ). We first consider Setting 1 (continuous). We sample outcomes from the following linear model: for k = 1, 2 and i = 1, · · · , n k .
Here σ k control the signal noise ratio within each sample. We simulate data with K = 2 independent experiments with sample sizes n 1 = n 2 = 1000 and the number of covariates p = 200. For each setting, we run m = 1000 simulations to obtain the power (the expected proportion of true signals being selected) and the FDR. We first generate covariates X 1 i ∼ N (0, Σ(ρ 1 )), · · · , X K i ∼ N (0, Σ(ρ K )) independently, where Σ(ρ) is an auto-correlation matrix with its (i, j)-th element equals to ρ |i−j| . Next we generate coefficients β 1 , · · · , β K for the K experiments. We use s 0 to denote the number of mutual signals among the K experiments, and s k to denote the signals only present in the k-th experiment. We consider two scenarios for the strengths of mutual signals: 1. the directions and strengths of the mutual signals are identical among the K experiments; 2. only the directions of the mutual signals are the same but the signal strengths are independent among the K experiments.

C.1.2: Simulation settings for K = 3
Data generation We first describe the data generation procedure to obtain data for a single experiment (Y k , X k ). We sample outcomes from the following linear model: for k = 1, 2, 3 and i = 1, · · · , n k .
Here σ k control the signal noise ratio within each sample. We simulate data with K = 3 independent experiments with sample sizes n 1 = n 2 = n 3 = 1000 and the number of covariates p = 200. For each setting, we run m = 1000 simulations to obtain the power (the expected proportion of true signals being selected) and the FDR. We first generate covariates X 1 i ∼ N (0, Σ(ρ 1 )), · · · , X K i ∼ N (0, Σ(ρ K )) independently, where Σ(ρ) is an auto-correlation matrix with its (i, j)-th element equals to ρ |i−j| . Next we generate coefficients β 1 , · · · , β K for the K experiments. We use s 0 to denote the number of mutual signals among the K experiments, s 1 , s 2 , s 3 to be the signals only present in experiment 1, 2, 3 respectively, and s 12 , s 13 , s 23 to be the signals only present in experiments 1 and 2, 1 and 3, and 2 and 3 respectively. We consider the scenarios that only the direction of the mutual signals are the same but the signal strengths are independent among the K experiments.

C.1.3: Simulation settings for power comparison
When using the difference function (W = Z − Z) as the flip-sign function, we have W = K k=1 (Z k − Z k ) which is clear to have power because if feature j has effects within each subsample (i.e., Z kj − Z kj > 0 with high probability), then we will expect W j > 0 with high probability. However, for other combination functions, this is not very straightforward. So we plot the distribution under various null and alternatives of 8 functions defined below to show they have power when K = 3 and test statistics Z jk s from the absolute value of the normal distribution.
where the max between vectors are taken elementwise. Additional simulations were performed for K = 2 under the settings s 0 = 40, q = 0.2, ρ 1 = 0.4, ρ 2 = 0.6, σ 1 = 1,σ 2 = 2, A = 1.2 and s 1 = s 2 = 0 or s 1 = s 2 = s 0 , for Scenarios 1 and 2 to compare the power of these functions and it turns out that the signed max and difference functions have the best performance among the functions we explored when there are sample-specific effects (i.e., s 1 , s 2 = 0).
To see the potential limitations of the proposed method, we consider the setting for K = 2 under the settings s 0 = 40, q = 0.2, ρ 1 = 0.4, ρ 2 = 0.6, σ 1 = 1,σ 2 = 2. We let the fake signals (those shown only in one of the two studies) have stronger effects than the true signals. We vary the ratio between the magnitudes of the fake signals and the true signals from 1.5 to 5.

C.2 Algorithm implementing details for simulation
• Knockoffs construction: For the simultaneous and intersection knockoffs, we construct the knockoffs for each experiment separately. We use the Fixed-X knockoff method for the continuous outcome setting of sections C.1.1 and C.1.2 and use the second-order Model-X knockoff construction method for the binary outcomes setting of section C.1.1. For the mixed outcome setting of section C.1.1, we use the Fixed-X knockoff method for the first experiment with the continuous outcome and use the second-order Model-X knockoff construction method for the second experiment with binary outcome. For pooling knockoffs, we construct knockoffs for the pooled data. We use the Fixed-X knockoff method for the continuous outcome setting of sections C.1.1 and C.1.2 and use the second-order Model-X knockoff construction method for the binary outcomes setting of section C.1.1.
• Statistics calculation: For each experiment (simultaneous and intersection knockoffs) or pooled data (pooling knockoffs), We choose the absolute value of the coefficients from the • Calculating the filter statistics W: For the simultaneous knockoffs, we use the OSSF function derived from the composition of the OSCF in equation (10) of the main paper and the flip sign function in equation (9) of the main paper. For intersection and pooling knockoffs, we use the antisymmetric function W = Z − Z.
• Calculate the threshold and selection set: we control the FDR at a level of q = 0.2 and use the knockoff+ filter as defined in equation (8) of the main paper.

C.3 Additional simulation results
C.3.1 Additional simulation results for continuous outcomes with K = 2 In Figures 3 and 4, we show results for Setting 1 (continuous) with Scenario 1 (same signal strengths) and Scenario 2 (different signal strengths). For the continuous settings, when there are only simultaneous signals present (s 1 = s 2 = 0) and the signal strengths are equal (σ 1 = σ 2 ), pooling method also controls the FDR. When s 1 = s 2 = 0, but signal strengths are different (σ 1 = σ 2 ), pooling method does not control the FDR. When there exist non-mutual signals in one group (s 1 = 0, s 2 = 0), the pooling method fails to control the FDR, and the FDR increases as s 2 increases. The intersection method still controls the FDR and has slightly lower power than the proposed simultaneous method. When non-mutual signals are present in both groups (s 1 = s 2 = 0), we observe that with s 1 = s 2 increases, the intersection method does not control the FDR. In this case, only the simultaneous method successfully controls the FDR with very small power loss compared with the other two methods. The power does not change as the difference between the signal strengths from the two experiments increases. In general, for Scenario 1, where both the strengths and the directions (positive or negative) of mutual signals are the same, the simultaneous and the intersection methods have similar power, which is only slightly lower than the pooling method. For Scenario 2, there is some power loss with our proposed simultaneous method when compared with the pooling method; the loss is small, suggesting that gaining communication efficient property does not cost us much.
C.3.2 Additional simulation results for binary outcomes with K = 2 In Figures 5 and 6 we show results for Setting 2 (binary) with Scenario 1 (same signal strengths) and Scenario 2 (different signal strengths).
For the binary settings, we see similar results as the continuous setting. When there are only mutual signals present (s 1 = s 2 = 0), all three methods control the FDR and the pooling method has the best power, and our proposed simultaneous method has better power than the intersection method. When there exist non-mutual signals in one group (s 1 = 0, s 2 = 0), the pooling method fails to control the FDR, the intersection method still controls the FDR, but it has lower power than the proposed simultaneous method. When non-mutual signals are present in both groups (s 1 = s 2 = 0), we observe that with s 1 = s 2 increases, the intersection method does not control the FDR. In this case, only the simultaneous method successfully controls the FDR. The power loss of the simultaneous method compared with the pooling method is larger than the continuous settings but is still acceptable. The performance does not change much with varying ρ 1 , ρ 2 or α 1 , α 2 .
C.3.3 Additional simulation results for mixed outcomes with K = 2 In Figures 7 and 8 we show results for Setting 3 (mixed) with Scenario 1 (same signal strengths) and Scenario 2 (different signal strengths).
For the mixed setting, the results are similar to the continuous setting. The pooling method can only control the FDR when s 1 = s 2 = 0. The intersection method can control the FDR when s 1 and s 2 are small. However, it does not control the FDR when s 1 and s 2 become large (s 1 = s 2 ≥ 50). The power of the simultaneous method is slightly better than the intersection method when s 1 = s 2 .

C.3.4 Additional simulation results for continuous outcomes with K = 3
In this section, we present simulation results for the K = 3 cases when the outcomes are continuous.
From Figure 9, we can see that when the only signals are mutual signals (s 1 = s 2 = s 3 = s 12 = s 13 = s 23 = 0), all methods control the FDR and the pooling method has the highest power as expected. However, when there are signals that only occur within some of the samples, the pooling method fails and has very high FDR levels. The intersection method works fine when the false signals are only shown in one of the three features, but when the false signals are shown in two of the three features, the intersection method cannot control FDR. The simultaneous method can always control FDR as expected and its power is similar to the intersection method.  Figure 6: The power and the FDR for simulations with Setting 2 (binary K = 2) and Scenario 2 data. Column 1 includes the settings with s 1 = s 2 = 0, column 2 includes the settings with s 1 = s 2 = 0, column 3 includes the settings with s 1 = 0, s 2 = 0. Row 1 shows the experiments varying s 0 , s 1 , s 2 , row 2 shows experiments varying ρ 1 , ρ 2 , row 3 shows experiments varying α 1 , α 2 .

C.3.5 Additional simulation results for power comparisons
In this section, we plot the distributions of the filter statistics W j when assuming all underlying Z kj , k ∈ [K], and j ∈ [p], are independent normal distributions. We assume Z kj = |Z * kj | where Z * kj ∼ N (0, 1) when j ∈ H k and Z * kj ∼ N (µ, 1) otherwise. We vary µ from 3 to 5 to denote weak and strong signals.
The symmetric distributions of W j in Figures 1011131214 illustrate numerically that when the signals are not showing in all three datasets, the sign of W j is independent to its magnitude with P {W j > 0} = 1/2 no matter how strong the signal is, which is just what we need for the FDR control theorem to work. The asymmetric distributions of W j in Figures 1516 illustrate numerically that when the signals are showing in all three datasets, then we have a larger chance of observing positive W j and thus the test will have power (especially when the positive distribution of W j are far more toward right comparing to W j distribution for those features not having signals in all three data). Comparing these figures, we can see with the increase in the number or the strengths of signals in those null features that are only present in one or two groups will widen the distribution and thus lead to the lower power of the proposed method.
The simulation results comparing the power of different filter statistics from the setting described in section C.1.3. are given in Table 2. From Table 2 we can see that the signed max function and the difference function have the best performance among the functions we explored.
The power and the FDR for different methods are shown in Figure 17 when varying the ratio of signal strength between mutual signals and non-mutual signals. We can see that for both the pooling method and our proposed simultaneous method, the power decreases a lot when the non-mutual signals are strong. But unlike the pooling method, which has an increased FDR, our proposed simultaneous method has a relatively stable FDR at the nominal level.
Here we provide some insights into why our proposed filter statistics W will have power. Without loss of generality, we can assume all Z jk 's and Z jk 's are non-negative (such as the absolute value of the standardized coefficient). Then we have for fixed K, as n → ∞, Z jk will converge to 0 for all j, k, while Z jk converge to 0 for j ∈ H k 0 and a positive number otherwise. Thus we have that if j ∈ H , then both Z j and Z j will converge to 0 while if j ∈ S , then Z j ≈ K k=1 Z jk converges to a positive number while Z j converges to 0. Therefore, we have that the power will go to 1 as n → ∞. The power is a monotonically decreasing function of K and a monotonically increasing function of n. If K grows with the sample size n, then the power might not reach 1. The rate of power growth and requirements (i.e. the growth condition of K(n)) to let the power reach 1 is complicated, beyond the scope of this paper, and worth future investigations.   Table 2: Empirical FDR and power comparisons among different choices of filter statistics W for K = 2 and q = 0.2 with the settings s 1 = s 2 = 0 and s 1 = s 2 = 40 with or without same signal strengths between two data sets (Scenarios 1 and 2).