Associations, populations, and the truth: Recommendations for genetic association studies in Arthritis & Rheumatism



Genetics represents one of the most powerful and direct approaches to investigating the basis of human disease. In recent years, this approach has been greatly advanced by the recognition that much of the variation in the human genome is based on single-nucleotide polymorphisms (SNPs) (see Appendix A for a glossary of terms). The identification and mapping of these SNPs provides an unprecedented opportunity to understand the relationship between genetic variation and clinical disease. Studies on the association between genetic variants and disease can suggest pathogenetic mechanisms and have the potential for direct clinical application by providing markers of risk, diagnosis, prognosis, and, possibly, therapeutic targets.

Given the rapid technological progress in genotyping, the number of studies describing genotype–phenotype associations has increased rapidly. In a recent review of the literature database Genomics and Disease Prevention Information System maintained at the Centers for Disease Control and Prevention, it was found that 2,436 primary studies of this type were published in 2001, and 2,922 studies were published in 2002 (1). Moreover, as a result of the increasing availability of mapped SNP markers (currently, nearly 3,000,000), this trend is expected to accelerate. The submission of large numbers of articles concerning genetic association studies has led a number of scientific journals to propose guidelines for reporting genetic studies (2, 3).

The validity of such guidelines has been questioned because the recommendations reflect the opinions of individual investigators. In the history of science, reliance of an entire field on the recommendations of a few individuals has sometimes proved to be of limited value in moving closer to the truth (4). Arthritis & Rheumatism does not have such formal guidelines. Nevertheless, in order to provide guidance for writers, readers, and reviewers, this article addresses a number of issues relevant to the interpretation of genetic studies and reviews available evidence supporting recommendations for genetic association studies, with a strong emphasis on multifactorial diseases.

Association studies compare phenotype frequencies or trait levels (such as the extent of joint destruction) in well-defined groups with a genotype frequency in the gene(s) of interest. This design tests the assumption that differences in genotype frequencies relate to the characteristic or outcome measured. Thus, an appropriately constructed control population is essential to this design. Genotype frequencies may vary within populations because social groups within populations have unique genetic and social histories, patterns of migration, mating preferences, reproductive expansion, and stochastic variations (5). These unique characteristics of different groups can lead to different genotype frequencies within groups of a given population, so-called “population stratification.” Thus, although a control group may have been chosen using seemingly correct characteristics, the observed differences in genotype frequency between patients and controls may be due to population stratification.

In order to limit this source of error, family-based controls have been used widely. Thus, parental alleles are classified either as “transmitted” to affected children and “nontransmitted”; these latter alleles are used as controls. Since the test and control chromosomes are derived from the same individuals (the parents), there should be no concern about hidden stratification of the cases and controls. The additional advantages of family-based association studies are that the accuracy of genotyping can be checked based on Mendel's law, and the ability to assign haplotypes is greatly simplified. Inherent limitations of family-based tests are that families with homozygous parents are not informative and that larger numbers of study subjects are needed than in a case–control design (6). In addition, association studies using unrelated cases and controls are more appropriate for studying gene–environment interactions, since family members may be overmatched for environmental exposures.

The subsequent mathematical and statistical analyses of the relationship between genetic variation and phenotype, either as a phenotype (e.g., presence or absence of rheumatoid arthritis) or as the outcome over time (e.g., rate of joint destruction), are based on the assumption that the phenotype is at least partly influenced by genetic factors. In general, studies using families with multiple members that have a given disease are the most compelling in establishing the relationship between genetics and disease susceptibility (e.g., studies of twins for susceptibility to rheumatoid arthritis).

For studies of the genetic influence on clinical course or response to medication, one assumes that DNA variations responsible for functional effects are related to the underlying biology. Common to each of these approaches is the use of tests of “association” between a given genetic variant, or set of variants, and the clinical manifestations of interest. Studies of complex traits in animal models demonstrate that the phenotype effects of genetic variants may differ with environment and other background genes (“epistasis”) (7). Given the different genetic and environmental backgrounds in patients, it is to be expected that differences in the expression of the phenotype exist, which leads to phenotype complexity.

Thus, both genetic heterogeneity and phenotype complexity affect the case–control design of genetic association studies. What recommendations, then, are pertinent to promoting the value of articles about genetic association studies that are submitted to Arthritis & Rheumatism?

Recommendations for reproducibility and population stratification

For population-based case–control designs, careful attention must be paid to appropriately matched controls to avoid systematic differences (population stratification). Effect size estimates, including odds ratios with confidence limits, are important, and the statistical power to detect differences must be determined according to the size of the patient and control groups. In a meta-analysis of 370 studies addressing 36 genetic associations for various outcomes of disease (8), discrepancies between the first and subsequent studies of the same genetic association were found in 5 of 7 cases in which the initial publication had a sample size of less than 150 and in only 3 of 29 cases when the sample size was more than 150. Given these data, we feel that an adequate sample size is strongly indicated. (For help with power calculations, see Power for Association With Error [PAWE] at

Adequate description of the ethnicity and population origin of cases and controls (e.g., by description of the origin of the grandparents) is likely to minimize the chance that the results are caused by population stratification. Although, in linkage studies, population stratification is not an issue, adequate description of the population of origin is highly recommended in order to facilitate replication of the results by other researchers. Another approach to minimizing potential confounding effects due to population stratification includes typing of additional unlinked markers. If stratification is present, then the unlinked markers should also show associations with the phenotype, whereas if no stratification is present, only the markers of the candidate loci should show association. Pritchard and Rosenberg (9) found that with 15–20 unlinked microsatellites, the overall Type I error rate was less than 5%; with 30 biallelic markers, the error rate was approximately 6%. Given these low rates, the use of unlinked markers is an efficient way to diminish false-positive results caused by population stratification.

Replication using independent affected and/or control groups strengthens the confidence in any association study. In statistical terms, independent replication decreases the chances of reporting an association when no association actually exists (Type I error). An additional feature of replication is that replication requires homogenous data sets. Thus, studies in which the initial data are replicated in an independent data set are less prone to be false positive due to population stratification. Of course, lack of replication across different ethnic/racial subgroups or across different environmental settings may represent valid observations reflecting different background genes and/or gene–environment interactions.

Recommendation for selection of the genetic variant

Selection of the gene(s).

The selection of a candidate gene(s) can be justified on either a positional basis (hypothesis generating) and/or a functional basis (hypothesis testing). Genome-wide scans in rheumatic diseases, such as systemic lupus erythematosus, rheumatoid arthritis, and osteoarthritis, have generated a wealth of data (for review, see ref. 10). The arguments for why a given genetic variant is implicated in a disease can easily be presented in the context of regions of genetic effects from whole-genome scans.

Selection of the marker(s).

SNPs, which alter function either through nonsynonymous protein-coding changes, through modulation of transcription, or through effects on translation, are best suited for hypothesis testing. Nearly 3,000,000 SNPs are available in databases such as the Database of Single Nucleotide Polymorphisms (dbSNP) at the National Center for Biotechnology Information (NCBI; available at NCBI resource links, such as LocusLink, may assist in the assessment of the possible functional consequences of a particular gene variant. In some instances, a polymorphism with clear functional consequences, such as the factor V Leiden mutation, enables direct hypothesis testing. It is always important to evaluate any claim of functional consequences for a given SNP, however.

The association reported between a SNP in the 3′-untranslated region of the interleukin-12 (IL-12) p40 gene, which putatively affects expression, and type 1 diabetes mellitus is an example of a hypothesis-testing study that followed hypothesis-generating studies (11, 12). The role of IL-12 in the pathogenesis of type 1 diabetes mellitus is biologically plausible, and the gene encoding IL-12 p40 is located in a region with a positive linkage effect. Moreover, positive data were found in 2 groups totaling 422 families (12). Regrettably, the replication study that used 2,873 families did not find any association with the IL-12 p40 SNP and diabetes mellitus (11). Further examination of the function of the IL-12 p40 3′-untranslated SNP, which was initially based on 2 cell lines, showed that relative allele expression in heterozygous cell lines yielded a 1:1 quantification of each of the messenger RNA species. This experience underscores the value of replication not only of genetic associations, but also of functional data.

In hypothesis-generating studies, differences in the genotype frequencies between a control population and a patient population can reflect the effect of the SNP itself or the effect of the haplotype for which the SNP represents a “tag.” The concept of a haplotype tag is important because regions within which markers are nonrandomly associated or are in linkage disequilibrium (LD) are not necessarily limited to small blocks of the genome. This finding reminds both the investigator and the reader that causality cannot be inferred from association results (13). LD was originally viewed as the stochastic outcome of recombination, mutation, natural selection, and population history (14), but it is now apparent that throughout the genome, many regions are in high LD (so-called LD blocks) and are separated by short segments of very low LD. In Caucasian populations, the average length of LD blocks is about 20 kb.

Recently, the HapMap project has developed a genome-wide catalog of common haplotype blocks in multiple human populations and has finished the first complete map of human chromosome 19 (15). The number of available maps is expected to rise rapidly. Such LD blocks are defined by common alleles and, thus, have the disadvantage that the known SNP haplotypes within a block may not include uncommon alleles. Nevertheless, definition of extended haplotypes and use of haplotype tags can be very useful in examining the distribution frequencies of SNPs in patients and controls.

Plausible biologic context.

An understanding of the biology of the variant in the context of known pathophysiology greatly enhances the candidacy of any genetic variant and allows one to assess whether properties such as gene-dose effects should be present and to anticipate whether other aspects of the phenotype might logically be associated with the genotype.

Recommendation for quality measures

A consensus conference of the Centers for Disease Control and Prevention and the National Institutes of Health (16) has made several recommendations that are appropriate for submissions to Arthritis & Rheumatism as well:

“We recommend that authors specify the quality measures used for the genotyping analysis and provide information on the degree of reproducibility between quality control replicates. Quality control measures include 1) internal validation for analytic validity; 2) blinding of laboratory personnel to pertinent characteristics of the samples, donor subjects, and hypotheses being investigated; 3) procedures for establishing duplicates and quality control numbers from blind duplicates; 4) test failure rate, by study group; 5) inspection of whether genotype frequencies conform to Hardy-Weinberg equilibrium (in controls in case–control studies) and are consistent with other reports for the same population (this criterion should not be binding); and 6) blind or automated data entry and third party adjudication. If a large number of samples do not produce acceptable genotyping results, comparability data should be provided with samples that yield acceptable results.”

A consistent challenge to the field is the selective reporting and publication of positive findings. The above-mentioned recommendations may limit the number of false-positive results. However, it needs to be emphasized that well-done negative studies—appropriately powered with carefully constructed test populations and with careful quality controls for genotyping—are of interest to the field, especially when addressing previously published claims. If results from negative studies are published as well as results from positive studies, the effects of publication bias in the search for risk genes can be prevented.

The above-mentioned recommendations (Table 1) are illustrated below with a number of genetic association studies that have been reported before.

Table 1. Recommendations for genetic association studies published in Arthritis & Rheumatism
Recommendation for selecting populations for study
 Demonstrate adequacy of statistical power/adequate sample size
 Assure adequate matching of cases and controls to avoid hidden population stratification; consider typing additional unlinked markers to demonstrate the absence of population stratification
 Consider the use of family-based controls (such as parents), if available
 Replicate frequency of single-nucleotide polymorphisms in an independent control group
 Replicate frequency of single-nucleotide polymorphisms in an independent patient group
Recommendation for selecting the genetic variant
 Selection of the gene(s):
  Indicate if the study is hypothesis-testing versus hypothesis-generating
  Provide reference to positional data
 Selection of the marker(s):
  If the study is hypothesis-testing, assess the adequacy of the functional data, including replication by other groups
  If the study is hypothesis-generating, consider additional markers to define haplotypes
 Plausible biologic context:
  Consider gene-dose effects
  Consider properties of the phenotype logically associated with the genotype
Recommendation for quality measures
 Specify the quality measures used for the genotyping analysis
 Provide information on the reproducibility between quality  control replicates

Selecting a genetic variant for gene-hypothesis testing.

Macrophage migration inhibitory factor (MIF) is a counterregulator of glucocorticoid action with broad activating properties on the immune system (17). Given the powerful proinflammatory action of MIF in vitro and in vivo and the identification of functional polymorphisms in the promoter region of the MIF gene, investigators hypothesized that these variants would be differently distributed between controls and patients with juvenile and adult inflammatory arthritis. Although there appears to be an association of the MIF-173*C allele with juvenile idiopathic arthritis in general, the association appears to be most strong for systemic-onset and persistent oligoarticular disease (18). A plausible biologic context is reflected by the influence of the –173*C allele on levels of MIF in both serum and synovial fluid and the overall outcome and response to glucocorticoid therapy. Thus, the hypothesis was concordant with the phenotype characteristics of the patients.

Addressing population stratification.

Recently, Yamada et al (19) reported a screening of 112 polymorphisms in 71 candidate genes involved in myocardial infarction. The candidate genes were chosen on the basis of a putative role in the pathogenesis of myocardial infarction. The SNPs were chosen because they were expected to cause changes in function or in the level of expression of the encoded proteins. To replicate the findings, the phenotypically well-characterized group of patients with myocardial infarction was deliberately split in two parts (909 patients for the initial analysis and 4,152 patients for the replication analysis). This large study revealed that SNPs in the genes encoding connexin 37 (a protein involved in gap–junction communication between vascular endothelial cells), the plasminogen activator inhibitor type 1 (involved in fibrinolysis), and stromelysin 1 (involved in matrix metabolism) are associated with an increased risk of myocardial infarction. Although this study awaits replication in other cohorts (20), the design of the study by the 2-step approach included a control for population stratification and provides a solid starting point for the study of genetic factors involved in myocardial infarction.

Using functional data for hypothesis testing.

The kidney is the principal organ that regulates phosphate homeostasis. Studies of animal and cell models have strongly indicated that an enzyme, sodium phosphate cotransporter (NPT2a) regulates the transport of phosphate in the tubular cells (21). Analysis of the genomic sequence of the gene encoding NPT2a in 20 patients with persistent hypophosphatemia, a decrease in maximal renal phosphate reabsorption, and either bone demineralization or urolithiasis identified mutations in 2 patients (22). Next, the wild-type and mutant RNA were injected into cells and were shown to have different functions.

The evidence concerning the functional effect of the mutation (based on animal experiments, cell models, and the expression studies of the wild-type and mutant alleles) is so convincing that the biologic message (that NPT2a mutations contribute to phosphate homeostasis) is likely to be true. It is obvious that in such a study using a highly selected, very rare phenotype, it is not reasonable to ask for replication in order to publish the study.

Evaluating gene-dose effects and the relationship of genotype and phenotype.

Ultimately, the proposed pathophysiologic effects of a genetic variant plays an important role in assessing whether gene-dose effects should be present and whether different genotypes with the same candidate gene will confer a higher risk than 1 individual genotype. For example, sustained cardiac adrenergic stimulation has been implicated in the pathogenesis of heart failure. Thus, the distribution of 2 functionally distinct adrenergic receptor variants was examined in patients with heart failure and in controls (23,24). Both patients and controls were divided in 2 ethnic subgroups, since the allele frequencies of the polymorphisms were different in the 2 different races.

The likelihood, or odds ratio, of heart failure in black patients homozygous for 1 variant was 5.7. The possibility that disruption of function in 2 different adrenergic receptors might lead to an even higher risk was confirmed by the finding of an odds ratio of 10 in patients homozygous for both variants, making a compelling case that disrupted adrenergic receptor function can lead to congestive heart failure. In Caucasian patients, a similar effect was seen, and consistent with the epidemiology of heart failure in blacks and whites, these polymorphisms were more frequent in blacks (23, 24). These studies also nicely illustrate that an additional level of evidence is supplied by the observation of similar effects in 2 subpopulations of the original set of patients and controls.

Using additional markers to define haplotypes.

Abacavir is an effective drug for treating human immunodeficiency virus infection, but 5% of the patients experience a hypersensitivity reaction. Previous data indicated that a haplotype on chromosome 6 is associated with a hypersensitivity reaction. More than 100 SNPs from 12 candidate gene families were tested, and 2 SNPs (the TNF –238 G/A SNP and the HLA–B57 allele) were highly associated in the patients who experienced a hypersensitivity reaction (25). These genes are located on a 200-kb region of chromosome 6, opening the possibility that neither may the causal variant. Next, SNPs were tested along this genetic region, and the data demonstrated that the associations were highest for the HLA–B57 alleles and for the TNF-238 alleles, indicating that significant associations can extend well beyond LD blocks (25).


In this article, we have discussed several considerations that influence the reliability of reports of genetic associations. We hope that increasing awareness of these issues will lead to more robust study designs and will minimize the risk of overinterpreting such studies. Moreover, we hope that these comments will stimulate discussion and accelerate our understanding of the genetic basis for the diseases that are of interest to us all.




Single-nucleotide polymorphism or sequence variant. DNA consists of a strand of nucleotides. The DNA of each person is unique in the order or sequence of the 4 different nucleotides, denoted A, C, G, and T. If one nucleotide at a defined position in the DNA strand in one individual is substituted for another, a sequence variant is present.


Genetic constitution of an individual.


The measurable characteristics of an individual or group of individuals.


A defined phenotype caused by a specific genotype.


An alternative form of a gene. Human cells have two alleles, each occupying the same position on a chromosome.

Population stratification:

The occurrence of different allele frequencies in a healthy population that can be attributed to diversity in the background of the different subgroups in the population.


Process in which DNA molecules are broken and the fragments are rejoined in new combinations.


Co-inheritance of genetic loci that lie near each other on the same chromosome. The greater the linkage, the lower the frequency of recombination between the two loci.


A type of genetic marker characterized by repeated nucleotide sequences, usually 2, 3, or 4 nucleotides in length. In general, the number of repeats varies between different individuals, creating unique, highly variable genetic markers.

Nonsynonymous protein-coding changes:

SNPs (sequence variants) that lead to a different amino acid composition of the protein encoded by the gene.

Linkage disequilibrium:

The nonrandom association of genetic variants across the genome.


Combinations of genetic variants on one chromosome that tend to be inherited together.

Gene-dose effect:

The phenomenon that the phenotype effect of alleles (alternative forms of a gene) is related to the number of alleles. Two alleles have a larger effect than one allele, which has a larger effect than zero alleles.

Hardy-Weinberg equilibrium:

Given the assumption that no combination of alleles is prohibitive of reproduction, a frequency of two alleles, p and q, in a population leads to a genotype distribution of p2 for individuals homozygous for p, q2 for individuals homozygous for q, and 2pq for heterozygous individuals having both allele p and allele q.