From age correction to genome-wide association


  • Invited paper

Sarah Cohen-Woods, MRC Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, King’s College London, De Crespigny Park, London SE5 8AF, UK.


Objective:  Eric Strömgren was one of the pioneers of psychiatric genetics and family studies. There has now been an explosion of interest in this field and research progress, including linkage and association studies, whole genome genotyping, copy number variants and epigenetics is reviewed here.

Method:  An overview of this area of psychiatric research is presented and discussed based on the relevant literature aiming at giving a recent status of the progress.

Results:  Broadly speaking linkage and association are complementary approaches used to locate genes contributing to the genetic aetiology of psychopathology. Linkage can be detected over comparatively large distances, however power is problematic when searching for quantitative trait loci with small effect sizes. In contrast, association studies can detect small effects but only over very small distances. Therefore, while several genome-wide linkage studies in psychiatric disorders have been performed, the majority of association studies have investigated specific functional candidate genes.

Conclusion:  Due to very recent technological advancements, genome-wide association studies have now become possible and have identified some completely novel susceptibility loci. Other recent advances include the discovery of epigenetic phenomena and copy number variants.

Clinical recommendations

  •  By understanding the molecular neurobiology of psychiatric disorders through the identification of genes we should be able to develop more efficacious and specific medicines.

Additional comments

  •  With cheaper, faster techniques evolving, and our new understanding of phenomena such as epigenetics and copy number variants, we are moving closer to our goal of unravelling the causation of psychiatric disorders.


Strömmgren’s legacy to genetics

Eric Strömgren was one of the pioneers of psychiatric genetics. In 1935, just a year after he graduated from medical school, he made a pilgrimage to Munich to study epidemiology and genetics at the Deutsche Forschungsanstalt Für Psychiatrie which at this point was the only place in the world that had a credible critical mass of investigators studying these topics in relation to psychiatry. Strömgren subsequently wrote a thesis for his research doctorate on ‘contributions to psychiatric genetics based on investigations of an island population’ on family data that was collected in the island of Bornholm.

One of the problems in analysing family data is that the researcher wishes to know not just how many relatives of index cases are so affected by an illness at the time of study, but what proportion of relatives will be affected by the disorder during their lifetime. This immediately poses a problem in most common diseases in that some unaffected relatives will be too young to develop the disorder, some will later become affected but are as yet unaffected and the only group whom the researcher can count with confidence is those who have lived through the entire period of risk and who remain well. Consequently the denominator in the proportion, affected/unaffected, is complex and needs to be corrected for the extent to which unaffected relatives have survived the period of risk. When Strömgren performed his Bornholm study the problem of how to apply age correction to family data had already been addressed by the German physician, Weinberg, who devised two approaches, a ‘shorter’ and a ‘longer’ method. Weinberg’s longer method, using a life-table approach, was regarded by Strömmgren as too cumbersome while Weinberg’s shorter method (which applied weights of 0 for those below the age of risk, one half for those within the age of risk, and one for those beyond the age of risk to calculate a Bezugsziffer, or corrected denominator) Strömgren regarded as too simplistic. He therefore devised a more realistic and refined method of weightings based on the age of onset distribution of the disorder that was, in modified form, used in genetic and epidemiological studies even until fairly recently [e.g. (1)].

Studies such as Strömgren’s in the island of Bornholm provided more precise data to support the view that most clinicians had held for a long time, that psychiatric disorders tend to run in families. Subsequently twin studies and adoption studies demonstrated that the familiality (i.e. the tendency for a disorder to ‘run’ in families) of most major psychiatric disorders is largely attributable to genes (2).A recent theme in genetic epidemiological studies in psychiatry that was probably not envisaged by Strömgren and his contemporaries is of substantial overlap in aetiologies between the major categories of disorder, poor example between schizophrenia, bipolar disorder and schizoaffective disorder (3). On the other hand, although all mood disorders were lumped together as manic depression during Strömgren’s early days, there is now good evidence of partially distinct as well as partially overlapping genes in mania and depressive disorder (4). Studies also suggest specific genetic components to schizophrenia and bipolar disorder (3).

Aims of the study

The biggest change that has occurred in the past seven decades since the Strömgren started out in psychiatric genetics is that the emphasis has shifted from simply estimating empirical risks of disorder in various categories of relatives to actually mapping and identifying the genes that confer susceptibility. The present paper describes and discusses these changes.

Material and methods

The relevant literature for the genetic research progresses, including linkage and association studies, whole genome genotyping, copy number variants and epigenetics, has been reviewed for the present paper.


Modes of inheritance and the genome

Psychiatric disorders rarely follow a simple single-gene-single-disorder model, despite media reporting ‘a gene for’ specific disorders or behaviours, with more complex genetics being responsible that involve multiple genes (Quantitative Trait Loci; QTLs) with the potential to interact both with the environment and one another. Over the last decade there have been huge advancements aiding the identification of candidate loci, including the working draft of most of the human genome (5) which has since been extended to fill in most gaps and includes the sequence of approximately 3 billion DNA bases (6). Our understanding of genetics has been changing as quickly as new genotyping and sequencing technologies have been developed, including the discovery that there are far fewer genes than had been predicted (about 25 000). To many this seemed a small number to provide the basic code for something as complex as a human being, and indeed as genetic research progresses intergenic regions (regions of the genome between genes that do not themselves directly code for a gene) are gaining importance and significance. Despite the unravelling of the human genome, and the discovery of fewer genes than anticipated, tracking down precisely those genes that influence common diseases and traits remains a difficult task, in part due to the increasing complexity of genetics that is only now beginning to be identified and understood (i.e. intergenic regions, copy number variants and epigenetics). However the investment and process of decoding the human genome has armed researchers with better DNA markers and sequencing systems than ever before, to enable us to navigate our way around the genome and tag the locations of QTLs that contribute to the liability to disease.

Broadly speaking there are two complementary methods, linkage and association, used to locate genes contributing to the genetic aetiology of psychopathology. Linkage can be detected over comparatively large distances, and consequently several genome-wide linkage studies in psychiatric disorders have been performed. However, power is problematic when searching for QTLs with small effect sizes. In contrast, association studies can detect small effects but only over very small distances. This would mean to gain genome-wide coverage huge numbers of markers would be necessary, and this has only become feasible recently in whole-genome association studies.


Linkage involves searching across the genome using hundreds of evenly spaced highly polymorphic DNA markers in pedigrees (families) in which two or more individuals are affected by the disorder being investigated. If particular markers are inherited by affected members within that pedigree over and above what is expected by chance, linkage is inferred. Pairs of genes are not usually co-inherited when passed from parents to offspring as they are either on different chromosomes, or if on the same chromosome they are sufficiently distant due to a process called crossing-over which occurs during meiosis. Departure from such independent assortment, such as the tendency for pairs of siblings both affected by the disorder to inherit one or more marker at a level greater than expected by chance, suggests that the marker and a disease gene are close together. Therefore the region surrounding the marker that shows evidence of increased sharing amongst affected individuals may be implicated in the disorder being studied. And so, within a linkage region a susceptibility gene may be identified using an allied approach called association which involves finer and more directed analysis of the region of interest (positional candidate gene approach, see below).


For linkage of a chromosomal region to be determined, family pedigrees are required; association studies can also be carried out using family data, but in their more straightforward form are population-based case–control studies. A specific allele occurring at a greater frequency than what is expected by chance in the affected group (risk allele), or in the control group (protective allele), is said to be associated. A causal link should not be automatically inferred as association as it may be a consequence of linkage disequilibrium (LD), i.e. the causal variant is so extremely close to the genotyped marker that their relationship is undisturbed by many generations of recombination and so it is in effect ‘tagged’ by the marker. A majority of association studies investigate functional candidate genes that are believed to be involved in the pathogenesis of the specific disorder. These genes often have been identified as candidates as they code for proteins that are hypothesized to be implicated in the disorder. For example, knowledge of the action of antidepressant treatments, such as selective serotonin reuptake inhibitors (SSRIs) which are known to have their site of action at the serotonin transporter, has lead to the identification of a candidate gene (5HTT). A common insertion/deletion polymorphism has been described in the promoter region of 5HTT which has been shown to result in high or low activity depending on the variant inherited; the ‘long’ (l) allele (insertion) is the high activity version of this promoter variant, with the ‘short’ (s) allele (deletion) being the low activity version. The gene is inherited on chromosome 17 and has been associated with personality traits affecting anxiety and depression. However results have been inconsistent with many failures to replicate (7–11). However there are some converging lines of evidence pointing to the 5HTT promoter variant affecting susceptibility to depression in conjunction with environmental stress. For example Caspi et al. (12) studied approximately 1000 people from a birth cohort in the Dunedin New Zealand and reported an interaction between the 5HTT promoter genotype and the development of depressive symptoms in response to stressful life events. This has been replicated (13) and has been reviewed (14), however studies still remain inconsistent (15, 16). Although there is much evidence indicating the 5HTT promoter variant is implicated in depression in conjunction with stressful life events, it cannot be considered the ‘gene for’ depression, but the first gene to be relatively confidently identified of what will be a fairly large number that can contribute to the liability of getting depressed. Pharmacogenetic studies point to evidence of a marked influence of the 5HTT promoter polymorphism on response to SSRI, with better treatment outcome in l-variant carriers among Caucasians (17). However, similar studies performed in Asian populations produce conflicting results, with some samples showing association in the opposite direction with the s/s genotype carriers showing better treatment response.

A candidate gene may also be identified by protein levels that may be different between individuals with a disorder and those without, such as brain-derived neurotrophic factor (BDNF) and bipolar disorder. BDNF is a neurotophic factor that plays a critical role in the promotion, differentiation and survival of neurons in the central nervous system (CNS). BDNF is important throughout development and is crucial in maintaining the survival of neurons through into adulthood (18).

Patients exhibit reduced levels of the BDNF protein relative to control individuals (19), and it has been reported that during episodes of mania, plasma levels are temporarily reduced (20). The gene, BDNF, is located on chromosome 11 and has been associated with bipolar disorder and schizophrenia, however results remain inconsistent (21–23).

In general each gene conferring susceptibility to common disorders contributes only a small amount of variance but the advantage of Association studies is capable of detecting genes of very small effect, as low as 1% of variance. One limitation with the candidate gene approach is that it is constrained within the current knowledge of a disorder, whilst chromosomal regions may be identified in linkage studies and then refined to identify other, previously not considered, genes of interest. These are called positional candidate genes as opposed to functional candidate genes as previously described. In discovering other genes, positional cloning approaches that utilize both linkage and association will be crucial.

Application of linkage followed by association mapping has begun to yield some results in the discovery of genes involved in the susceptibility of developing schizophrenia [reviewed by Elkin et al. (24)] and depression (25). Dysbindin (DTNBP1) and neuregulin-1 (NRG1), located on chromosomes 6 and 8 respectively, were initially identified via linkage studies investigating schizophrenia and have now been independently replicated as schizophrenia susceptibility loci by several studies (26–34). Another novel promising gene identified in this way is G72 on chromosome 13, which may be involved in susceptibility to both schizophrenia and bipolar disorder (35–41). This contradicts the traditional view that the two conditions are two separate entities, although it is in keeping with the twin analysis mentioned earlier (4).

Genome-wide association studies

There have been very recent technological advancements that give renewed promise to the identification of novel gene loci that are associated with complex psychiatric disorders. Whole genome association scans offer an advantage over linkage studies as they are able to detect genes of small effect that are likely to be overlooked by linkage (42, 43), however until recently the large number of closely spaced markers required to detect linkage disequilibrium remained an obstacle (44). Genome-wide association studies (GWAS) using large samples (sample sizes usually exceed 1000 cases and 1000 controls) have recently become possible, and these studies routinely genotype each individual for up to 1 million SNP markers. The principle behind genome-wide association studies is that they search across the genome for polymorphisms that influence the trait (direct association) or that are in linkage disequilibrium (LD) with causative variant(s) (indirect association). The latter results from the specific polymorphism (e.g. SNP) and causative allele(s) being so close together that they are rarely separated by meiosis. The first adequately designed and powered GWAS, from the WTCCC study (45), investigated 14 000 patients and 3000 shared population controls and seven diseases, including bipolar disorder (2000 cases).

In addition to the WTCCC study (45) there have been three more GWAS in bipolar disorder published to date (46–48). Similar to the findings in GWAS of non-psychiatric disorders, there is little agreement among the regions that yield the few strongest associations in each of the studies. However, meta-analyses, which are well-known to greatly enhance the power to detect loci small effect sizes, identified two associations at genome-wide significance level (47), one in the Ankyrin-G gene (ANK3) and the second in the gene encoding the α-1C subunit of the L-type voltage-gated calcium channel gene (CACNA1C), indicating that ion channelopathies may be involved in the pathogenesis of bipolar disorder. Interestingly, there is also evidence that the CACNA1C gene is associated with both schizophrenia and major affective disorder (49), however this has yet to be replicated.

In schizophrenia research, the first large case–controlled genome-wide association studies have been published recently (50–52), and GWAS data for ∼12 000 subjects with schizophrenia and ∼14 000 controls will shortly be available for meta-analysis. The same holds true for major depressive disorder, where two genome-wide association studies have been published to date (52, 53), with another currently pending final analysis and initial publication in our group.

The investigators performing GWAS of schizophrenia, bipolar affective disorder, major depression, attention deficit hyperactivity disorder and autism have agreed to participate in the Psychiatric GWAS Consortium ( High-quality meta-analyses of each disorder, cross-disorder analyses (including analyses of combinations of disorders and of phenotypes observed in two or more disorders, such as psychosis), and analyses of comorbidities (e.g. alcohol, nicotine, illicit drug use disorders) will be performed (54). This type of collaborative work is likely to form the basis of future research in psychiatric genetics.

There has been great difficulty in the identification of genetic loci that are associated with a psychiatric disorder and replicability. Power and effect size are a likely cause of this, as it is widely accepted in psychiatric genetics that researchers are attempting to elucidate complex multi-factorial diseases. Hunting for multiple genes of small effect invariably means that samples must be large in order to detect any association. Furthermore such genes are never likely to be necessary or sufficient causal variants and therefore always act in tandem (additively or multiplicatively) with one-another. The advent of GWA studies has catalysed collaboration in genetic studies of common disorders for this very reason, as the expense invested in these projects is huge and so the results should be maximized. The ability to replicate may also be hindered by heterogeneity; phenotypic and genetic. Phenotypic heterogeneity refers to the phenotype being ill-defined, or possessing substantial variation. This is particularly a possibility in disorders that have different levels of severity such as major depressive disorder; investigators can reduce the effect of such heterogeneity by limiting their samples to more extreme forms of a disorder. Genetic heterogeneity can take two forms: locus and allelic heterogeneity. Locus heterogeneity is when distinct genes (or combinations of genes) influence disease-risk. In contrast allelic heterogeneity refers to different variants within the same gene influencing disease-risk. Allelic heterogeneity has often been cited as a reason for the lack of specific SNP replication across studies, despite consistent associations with variation within a gene and a disorder, such as that observed with NRG1 and schizophrenia. It is plausible that allelic heterogeneity does not mean there is functional heterogeneity. Another confounding factor may be population stratification, although this is increasingly understood now and considered when comparing studies across populations.

It must be emphasized that genes identified as susceptibility loci in these disorders have not only small effects, but are likely to be subject to interplay with environmental factors as well as multiple other genes. As an example, further to establishing strong genetic influence in depression, quantitative genetic studies have also found substantial influence of an individual’s unique environment (55). A majority of studies in the field of psychiatric genetics fail to consider environmental factors in their designs and methods. There is evidence of gene–environment interplay, and so when possible the environment should be included in analyses; the environment, including biological challenges, will alter gene expression. The genotype of an individual may determine the genetic response to the environmental challenge, however without that environmental challenge it may not be expressed. Perhaps it is necessary for environmental measures to be incorporated in genetic studies in order for consistent results to be achieved, particularly given the importance of stress in disorders such as depression. Similarly gene–gene epistatic interactions have started to be studied. The brain is highly complex, and genes will not act independently of each-other. Therefore this is a logical extension of interaction analyses. Naturally this does come with a price, as multiple testing could reduce power of a study to detect an association. Replicating interaction analyses is invariably going to be more difficult due to the additional complexity in the model being applied.


As understanding of genetics and the human genome improves, new phenomena are identified that will gradually be addressed. For example epigenetic effects may exist, which describe regulation of gene expression that is heritable but potentially reversible, primarily a results of DNA methylation (where a gene is prevented from expressing and silenced) and chromatin structure which occurs independent of the DNA sequence. This kind of alteration of gene expression independent of the DNA sequence could potentially complicate associations that have been observed. Furthermore, it has been suggested that traditional DNA sequence-based approaches may not be fit many common aetiological observations in complex disorders (56). These include the relatively high incidence of discordance amongst monozygotic twins for disorders, the comparatively delayed onset of complex disorders, the frequently reported sex effects (unless the trait or behaviour being observed is X-linked), and parent-of-origin effects. Although normal genetic behaviour seems unable to satisfactorily explain such features of complex disorders it could be that epigenetics may (56), however it is more likely to be a combination. Another new avenue that is rapidly gaining pace and interest in addition to methylation, is copy number variants (CNVs), as they have important implications for many genotyping methods used that rely on fluorescence, and thus may be effected. CNVs are segments of DNA (from one kilobase to several megabases), for which copy-number differences (e.g. duplications or insertions, deletions) have been revealed by comparison of two or more genomes, and thus multiple copies of a gene may be present in the genome (57). CNVs can be inherited or occur de novo (new and not inherited) which would also help to explain behavioural differences between monozygotic twins. There is accumulating evidence that multiple rare de novo (and some inherited) CNVs contribute to the genetic component of vunerability to schizophrenia (57). For example, the 22q11.2 deletion syndrome (VCFS, velo-cardio-facial syndrome) is associated with a 3-Mb microdeletion, and approximately 25% of patients have psychiatric manifestations, including schizophrenia, attention-deficit hyperactivity disorder or autism spectrum disorders. A large genome-wide survey of rare CNVs (58) found deletions within the region critical for VCFS as expected, and further identified large deletions on chromosomes 1 and 15. Another study found significant association of schizophrenia and related psychoses with three deletions (including the deletions on chromosomes 1 and 15) in two samples (59). There is also evidence that, genome-wide, there is a greater load of low-frequency CNVs in patients suffering from schizophrenia than in controls (58, 60–62). These first results provide strong support that effects of multiple rare structural variants contribute to schizophrenia pathogenesis.

Future prospects

In summary, the future for understanding the genetic aetiology of psychiatric disorders has never looked brighter. With cheaper, faster techniques consistently evolving, and our understanding of new phenomena such as epigenetics and CNVs, we gradually reach closer to our goal of unravelling these disorders. However, as important as this is for science, the question of how this translates to the patients and the clinic is an important one. By understanding the molecular neurobiology of psychiatric disorders through the identification of genes we should be able to develop more efficacious and specific medicines. Although perhaps not so useful in diagnosis outright, DNA testing may inform the counselling of patients’ relatives that are at high risk of a heritable disorder, and furthermore it is highly probable such testing will become of significant importance in predicting the response to treatment and individual susceptibilities to the side-effects of therapeutic drugs. Finally, and equally important, by understanding the causes and mechanisms, stigma may be reduced through the demystification of psychiatric disorders enabling wider acceptance.

Declaration of interests