Department of Medicine, Department of Human Genetics, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, David Geffen School of Medicine at UCLA, Los Angeles, California, USA
Molecular Biology Institute, University of California, Los Angeles, California, USA
Recent technological and scientific advances have provided the tools needed to rapidly scan the genome for genetic variants affecting osteoporosis. In the last 2 yr, genome-wide association studies (GWASs) have identified several associations contributing to risk of fracture and related traits. These discoveries promise to illuminate important new pathways in bone metabolism, contribute to the development of novel therapeutics, and possibly harbor prognostic value. The initial GWAS results, however, suggest that alternative strategies may be necessary to comprehensively identify and characterize genetic risk factors. In this perspective, we review the status of GWASs for osteoporosis and discuss systems level approaches that have the potential to provide a much deeper understanding of bone biology.
GENOME-WIDE ASSOCIATION STUDIES OF OSTEOPOROSIS
Candidate gene association and family-based linkage studies in humans and controlled experimental crosses in the mouse have been the methods traditionally used to map osteoporosis genes.(1) Unfortunately, these efforts have been met with limited success. In just the last few years, however, this paradigm has dramatically shifted. Large-scale single nucleotide polymorphism (SNP) discovery efforts, such as the International HapMap Project,(2) in conjunction with the development of massively parallel genotyping platforms, have enabled genetic association studies to be performed on a genome-wide scale.(3) GWASs are performed by genotyping thousands of well-phenotyped individuals for hundreds of thousands (between 100 and 1000 K) of SNPs.(4) The end result is the identification of associated regions that, dependent on patterns of linkage disequilibrium between markers, typically span several kilobase pairs and harbor none, one, or a small number of candidate genes.
Since May 2008, 10 GWASs have identified nearly 30 independent loci affecting BMD and/or fracture.(5–14) Four studies have identified >20 loci affecting various aspects of stature and bone size.(15–18) Strong associations for BMD have been confirmed in or near many previously suspected candidate genes, such as the estrogen receptor (ESR1),(9,11) TNF receptor superfamily, member 11a (TNFRSF11A; RANK),(10,11) TNF (ligand) superfamily, member 11 (TNFSF11; RANKL),(9,11) SP7 transcription factor (SP7),(10–12) and low-density lipoprotein receptor-related protein 5 (LRP5).(8,11) However, most of the associations exceeding stringent genome-wide significance thresholds have been with novel genes, such as family with sequence similarity 3, member C (FAM3C)(5) and MAP/microtubule affinity-regulating kinase 3 (MARK3),(10) among many others. These novel genes have no known connection to bone and, once validated, their discovery should highlight important new biological mechanisms impacting bone metabolism.
Although GWASs are clearly a major advance for gene discovery, the results to date explain only a small fraction of the genetic component for traits such as BMD. For example, a large-scale meta-analysis of 19,195 individuals identified a total of 15 SNPs associated with lumbar spine BMD. However, in aggregate, these SNPs only explained 2.9% of the variance in spine BMD.(11) The undiscovered genetic component likely consists of a combination of many more common variants with increasingly smaller effects and the contributions of rare variants(19,20); both of which will be much more difficult to identify. It is also likely that inherited epigenetic modifications and gene by gene and gene by environmental interactions are significant sources of variation.
The key limitation of GWASs, however, is one common to all methods that strictly correlate genotype with phenotype-they are one dimensional. GWAS can identify common variants with relatively strong effects in a straightforward manner; however, it is not capable of providing information on the context in which those genes function, their relationships with other genes, or how these relationships change over time, in different environments or during disease. “Systems genetics,” which integrates the analysis of molecular phenotypes along with clinical traits, provides a powerful strategy for more fully understanding the complex interactions contributing to osteoporotic fracture.
Traditional genetic analysis attempts to directly relate DNA variation to clinical traits using linkage or association (Fig. 1). This strategy has been effective for “simple” Mendelian diseases; however, it has been much less successful for common forms of osteoporosis. The main difference is that in addition to clinical traits systems genetics examines molecular phenotypes, such as transcript levels assessed by DNA microarrays (Fig. 1).(21) This allows genetic mapping of molecular intermediates and more importantly the identification of correlations between molecular and clinical phenotypes. It should be noted, however, that the simultaneous evaluation of the effects of thousands of genetic variants and thousands of intermediate phenotypes presents many challenges. The most prominent being the problem of multiple comparisons, which requires sophisticated statistical analyses, such as data reduction techniques,(22) to control the false discovery rate.
As yet, the only molecular phenotype that can be examined globally using high-throughput technology is transcript levels,(23) although proteomic(24) and metabolomic(25) technologies are becoming increasingly broad and quantitative. This represents another important limitation for systems genetics, because some variants will predispose to disease, not through transcript levels, but by altering protein structure or activity. It is also becoming increasingly clear that microRNAs are important in the global modulation of gene expression and should be included in systems genetics analyses. A recent study with diabetic mice identified loci regulating microRNAs and showed that these were associated with altered levels of predicted target transcripts.(26)
ELUCIDATING THE GENETICS OF GENE EXPRESSION
As with any quantitative trait, the genetic determinants of a single gene's expression can be mapped. With DNA microarrays this can be extended to the ∼20,000 genes in the mouse or human genomes. Loci that control a gene's expression are referred to as expression QTLs (eQTLs) or expression SNPs (eSNPs), depending on if they were identified using linkage or GWASs (the distinction between eQTLs and eSNPs is important, but for clarity, we refer to all loci regulating expression as eQTLs). In either case, there are two types: local and distant.(27) Local eQTL are located in close proximity to the gene they regulate. Many local eQTL function in cis,(28) although they can be caused by the effects of neighboring genes acting in trans. In contrast, distant eQTLs act exclusively in trans and are typically located on a different chromosome relative to the gene(s) they regulate.
How can eQTL information be used to gain insight into the genetic regulation of osteoporosis-related traits? To begin, genes with eQTL are candidates for mediating genetic effects. If a SNP is associated with both BMD and the expression of a nearby gene, the alteration in gene expression may be the basis of the SNP-BMD association. Thus, for GWASs, the incorporation of expression data can provide a direct link between an association and a single gene and suggest a potential mechanism of action.(29) Currently, studies that incorporate microarray data in the context of an osteoporosis GWAS are lacking; however, expression data has been recently used to investigate the molecular basis of significant BMD associations. In a recent GWAS, a SNP near the 3′ end of the TNF receptor superfamily, member 11b (TNFRSF11B; OPG) was strongly associated with BMD.(8) The authors go on to measure the allele specific expression of TNFRSF11B in lymphoblast cell lines from HapMap individuals. This analysis showed that SNPs in the same region of the gene were associated with the expression of TNFRSF11B, suggesting that the difference in expression may lead to alterations in BMD.
Suppose that a transcript is highly correlated with a clinical trait across a set of individuals and there is evidence that one or more of the same genetic loci control both the transcript and the trait. This raises the possibility that either the transcript levels perturb the trait (a “causal” relationship) or that the transcript is perturbed by the clinical trait (a “reactive” relationship). Alternatively, these associations could be unrelated. As an example of the latter, a locus controlling the expression of a gene that perturbs BMD may be strongly correlated, as a result of linkage disequilibrium, with other genes residing at that locus. These other genes would also be correlated with BMD, yet they would not be functionally related to BMD. In such a case, the relationship would be described as “independent.”
The relationships between correlated elements of a system can be explored using mathematical modeling, and this type of modeling has been applied to many different areas of research, such as ecology and social sciences.(30) Similar analyses, involving partial correlation, or “conditioning,” can be applied to systems genetics data. A key concept in causality modeling, in a systems genetic context, is that DNA variation can serve as a “causal anchor,” because information flows from DNA to molecular and clinical phenotypes but not vice versa. For example, if a single locus regulates both a gene's expression and BMD, and the gene is causal for BMD, conditioning on the levels of the transcript would be expected to eliminate the relationship between the locus and BMD. In essence, conditioning removes the variation in BMD that can be explained by correlation with a gene's expression. If the transcript levels fully explain the effect of the locus on BMD, this would be consistent with a causal relationship where the locus perturbs transcript levels, which in turn, perturb BMD. Recently, algorithms for the statistical analysis of such systems genetics data have been reported.(31–33) Basically, they provide a means of assessing the statistical significance of the possible relationships between a genetic locus and the correlated traits controlled by that locus.
Causality modeling need not be restricted to genes that are regulated directly by local DNA variation. For example, suppose that the expression of gene A influences BMD and that gene A is regulated in trans by gene B, which resides on a different chromosome. In this case, a local polymorphism affecting either the expression or structure of gene B could perturb both the expression of gene A and BMD. If so, gene A would be identified as causal for BMD even in the absence of a local eQTL. Thus, systems genetics, but not GWASs, would be capable of identifying gene A as a causal gene for BMD. As discussed below, causality modeling can also be used to infer causal relationships between the elements of biological networks.
Causality modeling has now been applied to several organisms and phenotypes. Most convincing have been studies in yeast, where validation is relatively straightforward, but several studies in mouse suggest that it should be broadly applicable to mammalian systems as well.(34) For example, in a study of adiposity in mice, a list of likely causal genes was derived from liver expression data in an intercross between two inbred mouse strains. The list consisted of genes that were highly correlated with obesity, regulated by eQTL, both local and distal, mapping to loci perturbing fat mass and were predicted to be causal for adiposity. The top candidate genes were tested for an effect on adiposity by altering their expression using transgenic or gene targeting approaches. Of nine genes tested, eight had some effect on adiposity.(34)
We have recently used systems genetics to identify causal genes for BMD. Using a similar approach described above for obesity, we identified 18 genes predicted to be causally linked to BMD.(35) Three of the 18, twist homolog 2 (Twist2), wingless-type MMTV integration site 9A (Wnt9a), and matrix metallopeptidase 14 (membrane-inserted) (Mmp14), are known to influence bone mineralization, suggesting that causality modeling is an effective way to identify BMD genes. Although the causality modeling predictions are statistically strong, these genes must be validated using experimental perturbation to confirm their role in the regulation of BMD.(36)
It is clear that the elements in biologic systems, whether genes, transcripts, proteins, or metabolites, exhibit complex, nonlinear interactions, with some elements interacting with many other elements and most others interacting with few. Thus, biologic processes are better described as “networks” than as linear pathways. This was first observed in the modeling of metabolic interactions and was subsequently observed in studies of protein interactions and transcript regulation.(37–39)
Biologic networks can be represented as graphs consisting of a collection of nodes and edges. In the case of gene networks, the nodes are genes, and the edges represent a relationship between two genes. A particularly useful network in systems genetics is one in which the edges are defined as the correlation of two gene expression traits across a sample population. Such “coexpression” networks are based on the concept that genes that exhibit similar regulation over a large number of perturbations are likely to be functionally related.(40)
Single gene perturbations, such as in a transgenic mouse, can alter the expression of many genes. However, the relationships between the genes cannot be determined because there are essentially only two states: transgenic or nontransgenic. Multiple gene perturbations, such as those resulting from naturally occurring genetic variation among a large number of individuals, are capable of a range of states for each gene and for relationships between genes, allowing for the generation of gene networks.
A disadvantage of coexpression networks is that causality must be established by other methods. Whereas causality is clear in a single gene perturbation experiment (i.e., the transgene perturbs many other genes), the nodes in a coexpression network are simply correlated and, as highlighted above, correlation alone does not imply causation. Such a network in which the causal relationships between nodes are unknown is termed an “undirected” network compared with a “directed” network. It is possible to produce “directed” networks from systems genetics data using the causality modeling approaches discussed above.(41,42) More sophisticated Bayesian analyses, in which additional elements or information are incorporated into the network modeling, have also been performed.(43) It should be noted, however, that such analyses are only hypothesis generating and that firm conclusions require experimental validation.
Global studies of biologic networks have shown certain conserved topologic features. One, mentioned above, is the tendency of most elements to have relatively few edges, whereas some have many (the latter termed “hubs”). Another is the tendency of elements to cluster into highly connected groups, termed “modules,” which presumably share functional aspects. For example, in protein interaction networks, proteins located in particular cellular compartments, such as nuclei or mitochondria, tend to exhibit clustering.(44) Similarly, coexpression networks generated from systems genetics exhibit clear clustering. One indication that modules are functionally significant is that they are frequently highly enriched for genes with known biologic functions. Another feature of interest is that particular modules have been found to correlate significantly with clinical traits, suggesting that there is a relationship, either causal or reactive, between the module as a whole and clinical phenotypes.(45) Recent studies have expanded on this concept, showing through experimental perturbation the coherence of modules and their relevance to clinical traits.(46–48) The feasibility of constructing similar networks for bone traits is discussed below.
To date, systems genetics studies for bone traits have been limited. One of the major obstacles for systems genetics studies of osteoporosis is accessibility to relevant tissues for expression profiling. Circulating monocytes are one possible source. They are easy to isolate and purify and they can serve as osteoclast progenitors.(49–51) Bone tissues samples harvested during orthopedic procedures, or primary osteoblasts derived from such samples, have been used on a small scale for bone-related microarray gene expression studies,(52,53) and it may be possible to extend this approach to larger studies. One issue that will need to be addressed when using cells is how their expression profiles differ when cultured. It will also be important to choose the most relevant clinical phenotypes that can be collected in a large population.(54)
In addition, we believe the mouse will play a much larger role because of the inherent advantages of inbred strains for systems genetics,(55) especially given the challenges of generating genome-scale data in human bone. The use of the mouse is also being enhanced through the emergence of “next-generation” mapping populations, such as heterogeneous stocks(56) and outbred mice.(35,57)
Technological and statistical improvements have and will continue to be the driving force for systems genetics. Next-generation sequencing technologies are poised to revolutionize our ability to quantitatively and qualitatively characterize the transcriptome.(58) In addition, we expect that emerging technologies will soon enable the routine evaluation of biological components other than the transcriptome. It is also expected that, as the field continues to mature, novel statistical algorithms will provide new ways to approach and analyze systems genetic data.
In conclusion, we see systems genetics as a way to enhance the next generation of bone-related GWASs and provide a way to transition from gene-based to network-centric views of disease. The use of systems genetics may yet unlock the potential of genomics for prevention of osteoporotic fractures.