## Introduction

Genome-wide association study (GWAS), typically using hundreds of thousands of single nucleotide polymorphisms (SNPs) across the genome, has become a powerful tool for identifying genes or genetic markers underlying disease susceptibility (Klein et al. 2005; Hunter et al. 2007; Sladek et al. 2007; Yeager et al. 2007; The Wellcome Trust Case Control Consortium (WTCCC) 2007). In a typical current GWAS, a panel of 100K–500K SNPs is often genotyped on thousands of individuals. Single-marker analysis, testing the association between the outcome and an individual SNP, is usually used for selecting a subset of SNPs for further investigation (Hoh & Ott 2003; Marchini et al. 2005; Schaid et al. 2005; Wang et al. 2007; Skol et al. 2006; Yu et al. 2007). For example, in a two-stage GWAS (Skol et al. 2006), SNPs whose p-values (obtained from the single-marker analysis in the first stage) are less than a given threshold are evaluated further in an independent sample in the second stage.

A typical test statistic used in single-marker analysis for case-control studies is the Cochran-Armitage trend test (CATT), derived under the assumption of an additive mode of inheritance (Sasieni, 1997; Slager & Schaid, 2001; Zheng et al. 2006a). Since the CATT has an asymptotic normal distribution under the null hypothesis, ranking SNPs based on their test statistics is equivalent to ranking them on their p-values. The CATT for the additive model, however, is not very robust under other modes of inheritance, e.g., recessive or dominant models. A search for disease-related SNPs with their risk effects governed by a particular disease model might miss SNPs following other risk patterns. Furthermore, for complex diseases with low penetrance, usually none of the above simplified models is appropriate. Under these circumstances, efficiency robust tests, which retain high power across all scientifically plausible genetic models, are preferable (Sladek et al. 2007; Zheng et al. 2003; Zheng et al. 2006a). The theory of efficiency robust tests was summarized in Gastwirth (1985) and Freidlin et al. (1999). One commonly used robust test is based on the MAX statistic, the maximum of three CATTs derived under the recessive, additive, and dominant models, respectively. Empirical results show the advantages of using the MAX statistic over the CATT, derived for the additive model, to prioritize SNPs or to detect disease-associated SNPs (Zheng et al. 2006a).

Under the null hypothesis of no association, the MAX statistic does not follow the standard normal distribution asymptotically. Thus a computationally intensive resampling-based procedure is required to estimate its p-value. For example, in a GWAS of type 2 diabetes, Sladek et al. (2007) conducted 10,000 permutations per SNP to estimate p-values of MAX tests. They identified 59 SNPs, based on a p-value threshold around the level of 10^{−4}, for further replication in an independent sample. They then used 10,000,000 permutation steps to estimate the p-values associated with the MAX test on each of the 59 chosen SNPs, based on the replication sample. The reason for this extremely large number of permutation steps was to ensure a reliable estimation for any p-value falling below the level of 10^{−6}. Given situations where the p-value of MAX is not available and a fixed number of SNPs need to be selected for the next-stage study, Zheng et al. (2007) proposed using the MAX statistic rather than its p-value as the basis for the ranking. This approach is easy to carry out without any Monte Carlo simulation. However, the asymptotic null distribution for MAX depends on the genotypic distribution of the study SNP and is SNP-dependent. Therefore, the ranks of SNPs based on their MAX statistics are not weighted on the same scale. It would be more appropriate to rank SNPs based on their p-values.

In many GWAS, in order to account for the other covariates' effects, the logistic regression model is commonly used for the evaluation of individual markers' marginal effect. A similar MAX statistic can be defined based on three Wald (score, or likelihood ratio) test statistics, derived under the dominant, recessive, and additive genotypic effect models, respectively. Clearly, a more computationally intensive resampling procedure is required to estimate the p-value for this type of MAX statistic.

Although using the MAX statistic has various advantages over the CATT derived under an additive model, it is computationally challenging to apply it to a large-scale GWAS. In this article, we propose a simple approach to approximate the p-value of the MAX statistic without Monte Carlo simulation. The approximation formula, called the *Rhombus formula*, is designed to estimate the two-sided test p-value for the MAX statistic. This *Rhombus formula* is an extension of the *W-formula* of Efron (1997), which was originally derived to approximate the one-sided test p-value of the MAX statistic and had been applied to family-based association tests (Yan et al. 2008). To apply this rhombus formula, we need to estimate the covariance matrix for the three CATT (or Wald) tests corresponding to the additive, recessive, and dominant models. Zheng et al. (2006a) provided an analytic formula to estimate the covariate matrix for CATT-based tests. For Wald tests with adjustment for other covariate effects, we propose to use the approach of Pepe et al. (1999), which was based on the generalized estimating equation (GEE) method (Liang & Zeger, 1986), to estimate their covariance matrix numerically. We conducted extensive simulation studies to evaluate the accuracy of the proposed rhombus formula in the setting of the GWAS. To illustrate the application of our methods, we applied the results to 17 confirmed disease-associated SNPs from three GWAS and to a real dataset from a GWAS for coronary artery disease (CAD) with about 350K SNPs (WTCCC, 2007).